Data mining and management: a conversation with Huan Liu

In the wake of recent controversies over data mining, Huan Liu — one of the people behind ground-breaking TweetTracker technology — gives his insights on value and privacy implications associated with big data and data mining.

— By Louisa Wood

Frontiers in Big Data: Chief Editor Huan Liu, Data Mining and Management section

With ‘big data’ rapidly becoming the new world currency, questions of privacy, property and identity preservation in an increasingly transparent landscape have generated public concern.

The new Data Mining and Management section of Frontiers in Big Data was launched with recognition that data management tools need improvement to complement and safeguard against increasingly powerful data-mining algorithms and ever-growing databases.

We asked Huan Liu — the section’s Chief Editor and leader of the Data Mining and Machine Learning Lab at Arizona University — to tell us about his research, and give some academic insights on the value and privacy implications that are associated with big data and data mining.

Why do you think big data is so significant?

Data is abundant. Data is pervasive. Data is evolving. Above all, it is powerful.  Whoever has access to the largest amount of high-quality data gains the most accurate insights from it. This advantage can be used to make highly informed decisions.

The concept of data mining is often surrounded by controversy, particularly relating to issues of privacy. Do we really have to lose our freedom in this digital age?

Data mining is a relatively new type of data analysis that is still in the process of developing and shaping itself. Therefore, it is not surprising that it is causing various concerns — privacy is one of them. Privacy is a complex issue, especially when data contains personal information. The more data that is collected on an individual, the more targeted online services can become, but the compromise is that their data is less private. Researchers recognize the need for balance between privacy and service relevance and are therefore contemplating new business models as well as novel approaches to privacy protection without sacrificing relevance.

Do you think big data will change our world?

Yes, big data does have transformative power in the sense that “data is the new oil”. It is a new resource that we can tap into. For example, it could jeopardize or reinforce democracy depending on how it is used and who uses it. However, as researchers, we are working on securing data. For example, some members of my group at the University of Arizona are researching ways of detecting ‘fake news’ on social media and tracking it, which could help to ensure a democratic approach to data handling.

How are you using big data and data-mining in your research?

My work focuses on designing computational methods for data mining, machine learning and social computing, from basic research to real-world applications. One recently developed system is TweetTracker, a web-based system that collects and visualizes social media data. By collecting data from sites such as Twitter, Instagram, YouTube and VKontakte, first responders can get a more accurate picture of what is unfolding on the ground during crisis and disaster situations, therefore providing targeted relief. TweetTracker has already been used by researchers and NGOs, for example, to gather data pertaining to evacuation efforts during Hurricane Sandy in 2012.

Big data and data mining is a rich field of research. What is your vision for the Data Mining and Management specialty section?

My hope is to provide an interdisciplinary platform, via this special section, to integrate two areas of research that have previously been separated — data mining and data management. To this end, the Data Mining and Management section will foster conversation between research communities and industry to enable fruitful collaborations and exchange high-quality research outputs. We aim to cover a large breadth of topics within this framework, from privacy-preserving data sharing and scalable data mining to intelligent data management.

The Data Mining and Management section of Frontiers in Big Data welcomes high-quality article submissions and Research Topic proposals on a wide range of topics, including:

  • Intelligent data management

  • Information retrieval

  • Privacy-preserving data sharing and mining

  • Data visual analytics

  • Evaluation and validation

  • Trust and privacy

  • Cybersecurity in social data

  • Ethics issues with data mining and management.

Visit the section website for further information and follow @FrontBigData on Twitter.