Data Preparation for Artificial Intelligence

17.9K

views

21

authors

4

articles

Data Preparation for Artificial Intelligence

17.9K

views

21

authors

4

articles

Editors

2

Patrick Bangert

Searce Technologies Inc.

City of Hope National Medical Center

Impact

About

Much research has been conducted and published on the application of artificial intelligence (AI) to the realm of radiology and medicine. Most of this research focusses on three main topics: (1) The kind of AI model best suited for the task at hand, (2) the manner in which the AI model was trained and tuned to have the highest accuracy or performance for the task, or (3) the practicalities of applying an AI model to the professional practice of radiology with all of its organizational and financial ramifications.

The elephant in the proverbial room is the dataset used. All data scientists and AI engineers are painfully aware of the difficulties in collecting, curating, cleaning, governing, and securing a quality dataset that is representative, significant and informative about the situation to be modeled. In fact, getting to the point of having this kind of dataset can easily consume over 80% of the temporal and financial resources for the entire project.

In this Research Topic we will focus on ‘data-centric’ radiology AI. The papers will address and exemplify best practices to do with moving from a good idea to having a great dataset. The actual AI model selecting, training, tuning, and deploying efforts will be delegated to the huge body of literature that already exists.

Modes of collecting and storing radiological data are important not only due to regulatory reasons but also for various practical challenges unique to the medical industry, such as incompatible electronic health record systems. We are also challenged by having to maintain significant amounts of meta-data that are personal and sensitive. The extent to which the cloud may be used is treated as a panacea by some and as a precipice by others.

Most of the data that can be easily collected represents the ‘normal’ case. Often, we do not wish to model the normal case however. Then we may have to curate the dataset in such a way that the interesting cases are sufficiently well represented. For image data to be helpful for AI, it must usually be labeled. Labeling is a crucially important step that is usually ignored as the black sheep in the AI family. Once the sheer size of the dataset grows beyond a certain point, the IT system managing this data becomes important and must be designed correctly so that it can serve the purpose.

This Research Topic will cover all of these themes that are preparatory for AI. We find, nowadays, that the real lever in model quality is the dataset. Let’s investigate how to create the best datasets so as to achieve the best possible models.

In particular, we welcome articles that address the following questions, but is not limited to:

1. How can we best collect radiological data from many disparate sources worldwide?

2. How can we take into account different legal frameworks (e.g. HIPAA and GDPR) into this dataset?

3. How can we effectively spot irrelevant, superfluous, damaged or somehow not useful data in a large radiological dataset?

4. What are best practices to dealing with an imbalanced dataset – a dataset in which the number of cases in one category far outweigh the number of cases in another category?

5. How can we measure the significance and relevance of a dataset to a particular modeling question?

6. What are the best practices for labeling radiological images?

7. What are the best practices for radiological data governance?

8. What are the challenges in merging two existing datasets – they may not share each other’s macroscopic statistical properties?

9. What pre-processing steps are helpful in order to prepare raw radiological data for AI?

10. And so on.

Much research has been conducted and published on the application of artificial intelligence (AI) to the realm of radiology and medicine. Most of this research focusses on three main topics: (1) The kind of AI model best suited for the task at hand, (2) the manner in which the AI model was trained and tuned to have the highest accuracy or performance for the task, or (3) the practicalities of applying an AI model to the professional practice of radiology with all of its organizational and financial ramifications.

The elephant in the proverbial room is the dataset used. All data scientists and AI engineers are painfully aware of the difficulties in collecting, curating, cleaning, governing, and securing a quality dataset that is representative, significant and informative about the situation to be modeled. In fact, getting to the point of having this kind of dataset can easily consume over 80% of the temporal and financial resources for the entire project.

In this Research Topic we will focus on ‘data-centric’ radiology AI. The papers will address and exemplify best practices to do with moving from a good idea to having a great dataset. The actual AI model selecting, training, tuning, and deploying efforts will be delegated to the huge body of literature that already exists.

Modes of collecting and storing radiological data are important not only due to regulatory reasons but also for various practical challenges unique to the medical industry, such as incompatible electronic health record systems. We are also challenged by having to maintain significant amounts of meta-data that are personal and sensitive. The extent to which the cloud may be used is treated as a panacea by some and as a precipice by others.

Most of the data that can be easily collected represents the ‘normal’ case. Often, we do not wish to model the normal case however. Then we may have to curate the dataset in such a way that the interesting cases are sufficiently well represented. For image data to be helpful for AI, it must usually be labeled. Labeling is a crucially important step that is usually ignored as the black sheep in the AI family. Once the sheer size of the dataset grows beyond a certain point, the IT system managing this data becomes important and must be designed correctly so that it can serve the purpose.

This Research Topic will cover all of these themes that are preparatory for AI. We find, nowadays, that the real lever in model quality is the dataset. Let’s investigate how to create the best datasets so as to achieve the best possible models.

In particular, we welcome articles that address the following questions, but is not limited to:

1. How can we best collect radiological data from many disparate sources worldwide?

2. How can we take into account different legal frameworks (e.g. HIPAA and GDPR) into this dataset?

3. How can we effectively spot irrelevant, superfluous, damaged or somehow not useful data in a large radiological dataset?

4. What are best practices to dealing with an imbalanced dataset – a dataset in which the number of cases in one category far outweigh the number of cases in another category?

5. How can we measure the significance and relevance of a dataset to a particular modeling question?

6. What are the best practices for labeling radiological images?

7. What are the best practices for radiological data governance?

8. What are the challenges in merging two existing datasets – they may not share each other’s macroscopic statistical properties?

9. What pre-processing steps are helpful in order to prepare raw radiological data for AI?

10. And so on.

Share

Editors

Patrick Bangert

Searce Technologies Inc.

City of Hope National Medical Center

Coordinators

Nandha BALASUBRAMANIAM

University of Applied Sciences and Arts

Impact

17,864 Total views

12,485 Article views

4,002 Article downloads

1,377 Topic views

Published In

Journal Thumbnail

Frontiers in Radiology

Artificial Intelligence in Radiology

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Suggest a topic