A Privacy-Preserving and Standard-Based Architecture for Secondary Use of Clinical Data
Academic Article
Publication Date:
2022
abstract:
The heterogeneity of the formats and standards of clinical data, which includes both
structured, semi-structured, and unstructured data, in addition to the sensitive information contained
in them, require the definition of specific approaches that are able to implement methodologies that
can permit the extraction of valuable information buried under such data. Although many challenges
and issues that have not been fully addressed still exist when this information must be processed
and used for further purposes, the most recent techniques based on machine learning and big data
analytics can support the information extraction process for the secondary use of clinical data. In
particular, these techniques can facilitate the transformation of heterogeneous data into a common
standard format. Moreover, they can also be exploited to define anonymization or pseudonymization
approaches, respecting the privacy requirements stated in the General Data Protection Regulation,
Health Insurance Portability and Accountability Act and other national and regional laws. In fact,
compliance with these laws requires that only de-identified clinical and personal data can be processed
for secondary analyses, in particular when data is shared or exchanged across different institutions.
This work proposes a modular architecture capable of collecting clinical data from heterogeneous
sources and transforming them into useful data for secondary uses, such as research, governance,
and medical education purposes. The proposed architecture is able to exploit appropriate modules
and algorithms, carry out transformations (pseudonymization and standardization) required to use
data for the second purposes, as well as provide efficient tools to facilitate the retrieval and analysis
processes. Preliminary experimental tests show good accuracy in terms of quantitative evaluations.
Iris type:
01.01 Articolo in rivista
Keywords:
ETL architecture; secondary use of clinical data; HL7 FHIR; information retrieva; privacy laws; pseudonymization.
List of contributors:
Silvestri, Stefano; Ciampi, Mario; Sicuranza, Mario
Published in: