Big Data

How Big Data and large data sets are helping to ask and answer previously unknown questions and answers in research on hematological diseases.

by Jan Geißler, co-founder of the CML Advocates Network

Big Data – the term has been a topic of controversial debate for several years.

Why, despite all the criticism, should we gather and analyze big amounts of data? Why does it hold great potential for health research?

How does the HARMONY project launched by the EU’s Innovative Medicines Initiative make use of Big Data, and to what extent do patient organizations have a central role to play in this?

And what role does it play in CML?

Big Data – a word that has been circulating through the media, politics and public discourse for several years now. Critics usually use the term “big data” to refer to alleged surveillance measures by governmental institutions, such as data retention or the obscure business models of large social media platforms. But it’s worth taking a closer look: Where critics see great dangers in collecting and analyzing big amounts of data, proponents emphasize the opportunities that big data brings for health research, for example.

– More opportunity than risk: “Choosing the optimal therapy is vital for patients.” –

What potential does Big Data hold for health research? In light of the evolution from an organ-based to a personalized medical approach based on measurable biological attributes – for example, protein molecules in the blood that indicate a specific cancer – it becomes apparent that Big Data can be an important tool in establishing more fine-tuned research approaches and treatment methods. These can be more specifically tailored to a disease pattern and a patient group – especially in the case of rare diseases.

By generating anonymized personal data sets, such as clinical, imaging or molecular genetic data, and analyzing them (data mining), new insights can be gained into disease development and prevention, diagnosis and therapy. In concrete terms, this means using artificial intelligence to record and interpret the data sets of as many patients as possible in order to identify previously unknown correlations and relationships and thus, for instance, to be able to prefer certain therapeutic methods or exclude other ineffective measures from the outset.

These insights could have significant impact on the likelihood of a successful therapy, says Jan Geißler, who, with his patient organization LeukaNET and now also the CML Advocates Network, is involved in a project that aims to harness the potential of Big Data to develop better treatment methods and strategies for patients with blood cancers: HARMONY, a pan-European project of the EU Innovative Medicines Initiative launched in 2017. “Choosing the optimal therapy and avoiding over-, under- or even ineffective treatment is vital for patients,” continues Geißler who leads one of the work packages in HARMONY. He adds that it is very difficult to generate results, such as those already produced by HARMONY, from small data sets in individual clinical trials, “which is why the patient community is participating in HARMONY: to drive research for better treatment outcomes for patients.”

Know more about Harmony Alliance and Big Data in Blood Cancer:

– Project HARMONY: seven blood cancers in focus –

Although diagnosis and treatment methods for blood cancers have greatly improved in recent years, many are still incurable. For this reason and with the help of 94 partners and members, including seven patient organizations such as LeukaNET as well as pharmaceutical companies and university hospitals, HARMONY gathers genomic data from thousands of patients affected by the seven blood cancers AML, ALL, CLL, MM, MDS, NHL as well as pediatric and adolescent blood cancers. HARMONY then evaluates this information with respect to guiding research questions, for example: How does the body function under pathological change? What mechanisms lead to these changes? What molecular characteristics do cancer cells have?

Patient organizations have been involved and played an essential role in all of these aspects and in the review and evaluation of Big Data research proposals as well as the ethics review of the project from the very beginning. The project has already generated insights by evaluating genomic data from nearly 5,000 AML patients and more than 7,000 patients with multiple myeloma.

Another central goal of the project is to develop core outcome sets for these diseases. This method defines how and what researchers should measure in a disease and helps them to define meaningful endpoints, i.e., the goals of a clinical trial, for future studies. The project has defined Core Outcome Sets in Acute Myeloid Leukemia, and is currently running a Delphi-based process.

– Requirements on the data –

However, in order to carry out a Big Data project of this kind, a number of requirements have to be met, also regarding the quality of the data. This is because the data are characterized by a high degree of complexity, hardly any structure and a high degree of rapidity – and a large amount of data alone does not automatically provide new insights. For this reason, the source data must be as structured and consistent as possible and has to be standardized with the help of analytical and statistical evaluation tools.

Moreover, tools are needed that either anonymize personal data, i.e., delete all identifying characteristics such as the patient’s name or date of birth or location, or replace this information with pseudonyms. In the latter case, these identifying characteristics must be stored by a data trustee separate from the personal data. This step is particularly necessary with regard to the General Data Protection Regulation (GDPR) introduced in 2018. This is important especially in the healthcare sector, as anonymized or pseudonymized data are no longer covered by the GDPR and can be used for research purposes.

In the field of health research in particular, the pseudonymization method is often preferred because genomic datasets are highly complex and anonymizing them would distort and thus make them worthless for research. In order to comply with data protection requirements, HARMONY renders all datasets unrecognizable using a two-step pseudonymization process, with additional protection through access mechanisms and stores them on a specially created data platform that complies with EU directives. To render them usable, the data records, which stem from different clinical trial databases and differ in structure and composition, are also harmonized, i.e., standardized. Currently, the HARMONY database contains information from 45,000 patients with one of these seven blood cancers.

-HARMONY PLUS expands the research project to include CML and other diseases-

HARMONY PLUS, a second project launched in October last year as part of a public-private partnership, also builds on HARMONY’s structures by focusing on blood cancers not covered by HARMONY – namely chronic myeloid leukemia (CML), polycythemia vera (PV), essential thrombocythemia (ET), myelofibrosis, Hodgkin’s lymphoma, Waldenström’s disease and other rare blood cancers.

Through LeukaNET, the CML Advocates Network has joined HARMONY PLUS to contribute to upcoming CML-related Big Data research on the project’s platform.

Moreover, in addition to incorporating systematically collected, experience-based data, HARMONY PLUS focuses even more than HARMONY on the involvement of patient organizations and the patient community “to drive research for better treatment outcomes for patients,” Jan Geißler points out. After all, it is they who are not only the primary source of these data and can contribute significantly to gaining new insights – they also should be the primary beneficiaries of any new development in treatment.

For more information, please, visit “Harmony, Healthcare Alliance for Resourceful Medicine Offensive against Neoplasms in Hematology”.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.