D4.4 Initial version of the ProCAncer-I platform
This document outlines the initial version ProCAncer-I Platform’s architecture and describes its various storage implementations, services and tools, AI models’ framework, and monitoring and logging systems. At the current point of project’s development, a lot of emphasis is given to the upload of the retrospective data and its constituents processes: the data preparation in accordance with the ethical and security related requirements (anonymization), the annotation and curation using domain specific tools and metadata standards, and their efficient storage management to support scalable retrieval of structured information. In more detail, this document focuses on the following aspects of the architecture:
- The data upload process from the end user’s point of view and the eCRF application. We briefly describe the selected “use cases”, i.e., the patient case classification to answer the clinical questions that are the primary focus of the project;
- The data annotation and curation tools that are used for imaging related tasks like segmentation, motion correction, and co-registration;
- The core data management components, i.e., the DICOM Image Repository and the Metadata Repository, and their crucial role in the data upload process;
- The security related services to support users and services authentication and authorization and their integration with the ELIXIR federated identity infrastructure for cross-organization single sign-on.
Furthermore, this document provides the initial design for the ProCAncer-I Machine Learning/AI framework that will allow the deployment of state-of-the-art model development approaches (“MLops”) supporting the whole model lifecycle: from model building and tuning with “experiment tracking”, to model deployment and serving, and to monitoring of the “real world” performance and the alerting mechanisms to address challenges such as data and concept drifts.
The whole infrastructure is currently deployed on a commercial cloud but in a “cloud agnostic” way to avoid vendor lock-in problems and facilitate porting to a “cloud native” but provider independent solution in the future version