About this Research Topic
The aim of this Research Topic is to examine the Automated Data Curation and Data Governance Automation research to develop unsupervised methods and techniques to automate data curation and data governance processes to the greatest extent possible. The goal of fully automating data cleaning and integration has been labeled as a “data washing machine” by Richard Wang with some initial development led by John R. Talburt. Similar work has begun in the industry to develop methods for automating many of the data governance tasks, such as “positive data control” for maintaining the enterprise data catalog. Replacing human analysis with scalable, unsupervised automation of these processes will not be easy but necessary to keep pace with the increasing volume and variety of data driving modern decision systems.
Submissions to this Research Topic can address but are not limited to the following themes within the context of automated methods for:
• Data quality assessment and metrics
• Generating data quality validation rules
• Data cleansing (data washing machines)
• Spelling correction
• Missing value imputation
• Data standardization
• Multi-source data integration
• Entity and identity resolution
• Data governance policy and standards conformance
• Metadata generation
• Data catalog initialization and setup
• Updating data catalogs and business glossaries
• Data operations logging and data provenance
• Positive data control
• Generating data products
• Data as a service
• Data archiving, deletion, and disposal
Keywords: Data curation, data governance, data life cycle, data process automation, unsupervised data operations
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.