Search Contract Opportunities

Data Labeling and Curation at Scale (DLCS) for Machine Learning Algorithms

ID: DHS241-002 • Type: SBIR / STTR Topic

Description

The DHS Science & Technology Directorate (S&T) laboratories and DHS component operational partners generate large volumes (up to 10,000 or more measurements per day) of data from test events, prototype demonstrations, or targeted stream of commerce (SoC) data collections. These data are incredibly valuable to DHS and our R&D partners to support the development of next-generation detection algorithms, like those used at airports for on-person and accessible property screening in order to detect explosives and prohibited items. Currently, any data that is collected must be hand-annotated and stored on physical hard disks. This process is extremely time and labor intensive, while limiting DHS's ability to develop curated data sets and share data with R&D partners. R&D partners must also accept the data in the formats and labels that were hand created, as DHS does not currently have the capability to rapidly re-annotate or reformat existing data sets. DHS is seeking innovative techniques to accelerate and bring additional flexibility to DHS's data collection, labeling, storing, and distribution processes. The current state of the art relies heavily on human labeling and knowing desired metadata and curation schemes a priori. Successful solutions will limit the amount of human intervention required to perform these tasks, instead relying on automatic software to process most routine activities. It is assumed that the provided solution may include certain commercial-off-the-shelf (COTS) modules, but the focus of the research should be on novel data ingestion, labeling, and curation techniques. COTS modules included should support Government approved cybersecurity standards such as FedRAMP approval and/or compliance with FIPS 104-3 specifications. Capabilities of particular interest include the ability to ingest interesting file formats such as Hierarchical Data Formats, Digital Imaging and Communications in Security (DICOS) (an adaption of Digital Imaging and Communications in Medicine), and other defined but unusual data types, and then processing the data to assess complexity, identify common features/defined labels, and generate ground truth data for these files. Areas of uncertainty may be flagged for human review at a future time (at which point the human-generated ground truth may be analyzed to enhance the automated tools). Once the data is stored, it should be able to be easily curated, reprocessed (e.g. change file formats or ground truth formats), and distributed as packaged data sets. A successful solution should be able to be scaled significantly to support long-term use by DHS.

Overview

Response Deadline
Jan. 17, 2024 Past Due
Posted
Nov. 8, 2023
Open
Dec. 15, 2023
Set Aside
Small Business (SBA)
Place of Performance
Not Provided
Source
Alt Source

Program
SBIR Phase I
Structure
Contract
Phase Detail
Phase I: Establish the technical merit, feasibility, and commercial potential of the proposed R/R&D efforts and determine the quality of performance of the small business awardee organization.
Duration
6 Months
Size Limit
500 Employees
On 11/8/23 Science and Technology Directorate issued SBIR / STTR Topic DHS241-002 for Data Labeling and Curation at Scale (DLCS) for Machine Learning Algorithms due 1/17/24.

Documents

Posted documents for SBIR / STTR Topic DHS241-002

Question & Answer

Contract Awards

Prime contracts awarded through SBIR / STTR Topic DHS241-002

Incumbent or Similar Awards

Potential Bidders and Partners

Awardees that have won contracts similar to SBIR / STTR Topic DHS241-002

Similar Active Opportunities

Open contract opportunities similar to SBIR / STTR Topic DHS241-002