U01CA274576
Cooperative Agreement
Overview
Grant Description
Robust privacy preserving distributed analysis platform for cancer research: addressing data bias and disparities - project summary.
Privacy-preserving distributed analysis has gained increasing interests in the broad biomedical research community in recent years, as it can a) eliminate the need to create, maintain, and secure access to central data repositories, b) minimize the need to disclose protected health information outside the data-owning entity, and c) mitigate many security, proprietary, privacy and other concerns. As such, it offers great promises in lowering regulatory and other hurdles for collaboration across multiple institutions and enhancing the public trust in biomedical research.
Equally important, analysis of health data from multiple institutions across the US would yield more robust and generalizable findings. This is particularly relevant in cancer disparities research as the sample size for minority groups can be very small from one institution. However, there remain significant methodological gaps in the current state-of-the-art for privacy-preserving distributed analysis. Most notably, missing data present significant challenges, as they are ubiquitous in biomedical data including, but not limited to, electronic health records (EHR).
It is well known that missing data is a major source of bias in EHR. For example, patients from minority groups and those who have less access to private insurance tend to have more missing data in their EHR. Biased data as a result of missing data are known to yield unfair statistical and machine learning models, which in turn can perpetuate and exacerbate health inequities and disparities. There has been no work on principled approaches for properly handling missing data in distributed analysis beyond our recent works.
In addition, it is well-known that distributed analysis is still at risk of revealing important individual-level information and lacks rigorous guarantee in the sense of differential privacy, the prevailing notion and metric for privacy protection. To address these significant limitations, we propose three specific aims.
In Aim 1, we will refine and develop state-of-the-art imputation methods for handling missing data in distributed analysis and develop advanced functionalities for enhanced privacy protection through differential privacy control and homomorphic encryption.
Building on the methods developed in Aim 1, we will develop an open-source and open-access distributed analysis platform that includes a robust system architecture and user-friendly GUI in Aim 2.
We will assess and validate our distributed analysis platform using real-world use cases in cancer disparities research in Aim 3. With the enhanced privacy protection, our proposed distributed analysis platform will have the potential to further enhance public trust and lower hurdles for collaboration across multiple institutions in cancer research.
As such, our platform will enable researchers to use more information and less biased data in cancer research, enhance the validity, robustness, and generalizability of research findings, and offer research substantial benefits in areas including, but not limited to, cancer disparities and informatics practice.
Privacy-preserving distributed analysis has gained increasing interests in the broad biomedical research community in recent years, as it can a) eliminate the need to create, maintain, and secure access to central data repositories, b) minimize the need to disclose protected health information outside the data-owning entity, and c) mitigate many security, proprietary, privacy and other concerns. As such, it offers great promises in lowering regulatory and other hurdles for collaboration across multiple institutions and enhancing the public trust in biomedical research.
Equally important, analysis of health data from multiple institutions across the US would yield more robust and generalizable findings. This is particularly relevant in cancer disparities research as the sample size for minority groups can be very small from one institution. However, there remain significant methodological gaps in the current state-of-the-art for privacy-preserving distributed analysis. Most notably, missing data present significant challenges, as they are ubiquitous in biomedical data including, but not limited to, electronic health records (EHR).
It is well known that missing data is a major source of bias in EHR. For example, patients from minority groups and those who have less access to private insurance tend to have more missing data in their EHR. Biased data as a result of missing data are known to yield unfair statistical and machine learning models, which in turn can perpetuate and exacerbate health inequities and disparities. There has been no work on principled approaches for properly handling missing data in distributed analysis beyond our recent works.
In addition, it is well-known that distributed analysis is still at risk of revealing important individual-level information and lacks rigorous guarantee in the sense of differential privacy, the prevailing notion and metric for privacy protection. To address these significant limitations, we propose three specific aims.
In Aim 1, we will refine and develop state-of-the-art imputation methods for handling missing data in distributed analysis and develop advanced functionalities for enhanced privacy protection through differential privacy control and homomorphic encryption.
Building on the methods developed in Aim 1, we will develop an open-source and open-access distributed analysis platform that includes a robust system architecture and user-friendly GUI in Aim 2.
We will assess and validate our distributed analysis platform using real-world use cases in cancer disparities research in Aim 3. With the enhanced privacy protection, our proposed distributed analysis platform will have the potential to further enhance public trust and lower hurdles for collaboration across multiple institutions in cancer research.
As such, our platform will enable researchers to use more information and less biased data in cancer research, enhance the validity, robustness, and generalizability of research findings, and offer research substantial benefits in areas including, but not limited to, cancer disparities and informatics practice.
Funding Goals
TO IDENTIFY CANCER RISKS AND RISK REDUCTION STRATEGIES, TO IDENTIFY FACTORS THAT CAUSE CANCER IN HUMANS, AND TO DISCOVER AND DEVELOP MECHANISMS FOR CANCER PREVENTION AND PREVENTIVE INTERVENTIONS IN HUMANS. RESEARCH PROGRAMS INCLUDE: (1) CHEMICAL, PHYSICAL AND MOLECULAR CARCINOGENESIS, (2) SCREENING, EARLY DETECTION AND RISK ASSESSMENT, INCLUDING BIOMARKER DISCOVERY, DEVELOPMENT AND VALIDATION, (3) EPIDEMIOLOGY, (4) NUTRITION AND BIOACTIVE FOOD COMPONENTS, (5) IMMUNOLOGY AND VACCINES, (6) FIELD STUDIES AND STATISTICS, (7) CANCER CHEMOPREVENTION AND INTERCEPTION, (8) PRE-CLINICAL AND CLINICAL AGENT DEVELOPMENT, (9) ORGAN SITE STUDIES AND CLINICAL TRIALS, (10) HEALTH-RELATED QUALITY OF LIFE AND PATIENT-CENTERED OUTCOMES, AND (11) SUPPORTIVE CARE AND MANAGEMENT OF SYMPTOMS AND TOXICITIES. SMALL BUSINESS INNOVATION RESEARCH (SBIR) PROGRAM: TO EXPAND AND IMPROVE THE SBIR PROGRAM, TO STIMULATE TECHNICAL INNOVATION, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT FUNDING, TO INCREASE SMALL BUSINESS PARTICIPATION IN FEDERAL RESEARCH AND DEVELOPMENT, AND TO FOSTER AND ENCOURAGE PARTICIPATION IN INNOVATION AND ENTREPRENEURSHIP BY WOMEN AND SOCIALLY/ECONOMICALLY DISADVANTAGED PERSONS. SMALL BUSINESS TECHNOLOGY TRANSFER (STTR) PROGRAM: TO STIMULATE AND FOSTER SCIENTIFIC AND TECHNOLOGICAL INNOVATION THROUGH COOPERATIVE RESEARCH AND DEVELOPMENT CARRIED OUT BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO FOSTER TECHNOLOGY TRANSFER THROUGH COOPERATIVE RESEARCH AND DEVELOPMENT BETWEEN SMALL BUSINESS CONCERNS AND RESEARCH INSTITUTIONS, TO INCREASE PRIVATE SECTOR COMMERCIALIZATION OF INNOVATIONS DERIVED FROM FEDERAL RESEARCH AND DEVELOPMENT FUNDING, AND FOSTER PARTICIPATION IN INNOVATION AND ENTREPRENEURSHIP BY WOMEN AND SOCIALLY/ECONOMICALLY DISADVANTAGED PERSONS.
Grant Program (CFDA)
Awarding / Funding Agency
Place of Performance
Pennsylvania
United States
Geographic Scope
State-Wide
Related Opportunity
Analysis Notes
Amendment Since initial award the total obligations have increased 92% from $411,901 to $788,881.
Trustees Of The University Of Pennsylvania was awarded
Cooperative Agreement U01CA274576
worth $788,881
from National Cancer Institute in June 2023 with work to be completed primarily in Pennsylvania United States.
The grant
has a duration of 3 years and
was awarded through assistance program 93.393 Cancer Cause and Prevention Research.
The Cooperative Agreement was awarded through grant opportunity Early-Stage Development of Informatics Technologies for Cancer Research and Management (U01 Clinical Trial Optional).
Status
(Ongoing)
Last Modified 3/20/25
Period of Performance
6/1/23
Start Date
5/31/26
End Date
Funding Split
$788.9K
Federal Obligation
$0.0
Non-Federal Obligation
$788.9K
Total Obligated
Activity Timeline
Subgrant Awards
Disclosed subgrants for U01CA274576
Transaction History
Modifications to U01CA274576
Additional Detail
Award ID FAIN
U01CA274576
SAI Number
U01CA274576-3388359856
Award ID URI
SAI UNAVAILABLE
Awardee Classifications
Private Institution Of Higher Education
Awarding Office
75NC00 NIH National Cancer Institute
Funding Office
75NC00 NIH National Cancer Institute
Awardee UEI
GM1XX56LEP58
Awardee CAGE
7G665
Performance District
PA-90
Senators
Robert Casey
John Fetterman
John Fetterman
Budget Funding
Federal Account | Budget Subfunction | Object Class | Total | Percentage |
---|---|---|---|---|
National Cancer Institute, National Institutes of Health, Health and Human Services (075-0849) | Health research and training | Grants, subsidies, and contributions (41.0) | $411,901 | 100% |
Modified: 3/20/25