image 1 image 2 image 3

The 9th IEEE International Conference on Data Science and Advanced Analytics

October 13-16, 2022
Online

image 1 image 2 image 3

The 9th IEEE International Conference on
Data Science and Advanced Analytics

October 13-16, 2022
Online

Keynotes

Keynote Speaker 1

Professor Vipin Kumar

Professor, William Norris Endowed Chair in the Department of Computer Science and Engineering, University of Minnesota, USA.

Title:

Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery and Addressing Global Environmental Challenge

Abstract:

Process-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. There is a tremendous opportunity to systematically advance modeling in these domains by using state of the art machine learning (ML) methods that have already revolutionized computer vision and language translation. However, capturing this opportunity is contingent on a paradigm shift in data-intensive scientific discovery since the "black box" use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, an uninformed data-driven search can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious relationships, predictors, and patterns. This problem becomes worse when there is a scarcity of labeled samples, which is quite common in science and engineering domains.

This talk makes the case that in real-world systems that are governed by physical processes, there is an opportunity to take advantage of fundamental physical principles to inform the search of a physically meaningful and accurate ML model. While this talk will illustrate the potential of the knowledge-guided machine learning (KGML) paradigm in the context of environmental problems (e.g., Fresh water science, Hydrology, Agronomy), the paradigm has the potential to greatly advance the pace of discovery in a diverse set of discipline where mechanistic models are used, e.g., climate science, weather forecasting, and pandemic management.

Biography:

Vipin Kumar is a Regents Professor and holds William Norris Chair in the department of Computer Science and Engineering at the University of Minnesota. His research spans data mining, high-performance computing, and their applications in Climate/Ecosystems and health care. He also served as the Director of Army High Performance Computing Research Center (AHPCRC) from 1998 to 2005. He has authored over 400 research articles, and co-edited or coauthored 11 books including two widely used text books "Introduction to Parallel Computing", "Introduction to Data Mining", and a recent edited collection, "Knowledge Guided Machine Learning". Kumar's current major research focus is on knowledge-guided machine learning and its applications to understanding the impact of human induced changes on the Earth and its environment. Kumar' s research on this topic is funded by NSF's BIGDATA, INFEWS, STC, GCR, and HDR programs, as well as ARPA-E, DARPA, and USGS. He has recently finished serving as the Lead PI of a 5-year, $10 Million project, "Understanding Climate Change - A Data Driven Approach", funded by the NSF's Expeditions in Computing program. Kumar is a Fellow of the ACM, IEEE, AAAS, and SIAM. Kumar's foundational research in data mining and high performance computing has been honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD), the 2016 IEEE Computer Society Sidney Fernbach Award, one of IEEE Computer Society's highest awards in high performance computing, and Test-of-time award from 2021 Supercomputing conference (SC21).

 

Keynote Speaker 2

Dr. Gabriela Csurka

Principal Scientist at NAVER LABS Europe, France

Title:

 Visual Domain Adaptation in the Deep Learning Era

Abstract: 

As computer vision systems are being deployed in mission critical applications whose predictions have real-world impact, but where real-world testing data statistics differ significantly from lab-collected training data, domain adaptation (DA) is gaining an increasing societal importance. The aim of this talk is to give an overview of visual domain adaptation methods, starting with a brief introduction and recall of traditional domain adaptation algorithms proposed before the deep learning era.  Then, I will provide an overview of the main trends in deep domain adaptation and I will discuss how to handle situations that depart form the classic domain adaptation setting such as multi-domain learning, domain generalization, test-time adaptation or source-free domain adaptation. During the talk, I will discuss different DA application scenarios such as autonomous driving, visual localization, biomedical imaging, biometry and surveillance.

Biography:

Gabriela Csurka is a Principal Scientist at NAVER LABS Europe, France. Her main research interests are in computer vision for image understanding, 3D reconstruction, visual localization  as well as domain adaptation and transfer learning. She has contributed to more than 100 scientific communications, many in major CV conferences and journals. Concerning domain adaptation, in addition to related publications,  she has given several invited talks and organized a related tutorial at ECCV’20. In 2017 she edited a Springer book entitled Domain Adaptation for Computer Vision Applications and recently co-authored a Morgan & Clayton book entitled Visual Domain Adaptation in the Deep Learning Era which is under publication.

 

Keynote Speaker 3

 

Professor Limsoon Wong

Chair Professor in the School of Computing at the National University of Singapore

Title:

Some bad practices in data analysis and machine learning

Abstract: 

With the democratization of data analysis and machine learning through many easy-to-use platforms, many lay analysts are now involved in analyzing data to hopefully produce actionable insight, as well as developing tools for modelling their data. Unlike professional statisticians who have the benefits of  many years of rigorous training and many years of practising and perfecting the art of data analysis, lay analysts (like me, a computer scientist and logician) have rather ad hoc training. As a result, we have developed some bad data analysis habits, and some of us have even irresponsibly propagated these. In this talk, I will explain and bring attention to a few of these bad habits (including misusing principal component analysis as a dimension reduction tool, misunderstanding correlation as association, and mistreating accuracy as a one-dimensional performance measure), as well as discuss some impact of these bad habits (e.g., self-perpetuation of biased datasets.)

Biography:

Limsoon Wong is Kwan-Im-Thong-Hood-Cho-Temple Chair Professor in the School of Computing at the National University of Singapore (NUS). He was also a professor (now honorary) of pathology in the Yong Loo Lin School of Medicine at NUS. Limsoon is a Fellow of the ACM, named in 2013 for his contributions to database theory and computational biology. His other recent awards in these two fields include the 2003 FEER Asian Innovation Gold Award for his work on treatment optimization of childhood leukemias and the ICDT 2014 Test of Time Award for his work on naturally embedded query languages.

 

Keynote Speaker 4

Professor Xin Yao

Chair Professor of Computer Science at the Southern University of Science and Technology, Shenzhen, China

Title:

Online Learning of Data Streams with Concept Drift

Abstract: 

A growing number of applications operate in such a way that new data arrive with time, i.e., as a data stream. We do not have an offline data set for training. We can learn only when data arrive, either as a single data sample or as a chunk of data samples. The challenge of learning a data stream is to continuously learn from such an incoming stream. To make things worse, the underlying distribution of the data may change with time (i.e., concept drift). This talk first describes the learning-in-the-model-space framework, which can be used effectively to learning data streams with few assumptions. Online fault diagnosis will be used as an example to illustrate how learning-in-the-model-space can facilitate detecting and classifying unknown faults. Then the talk will present an ensemble approach that can adapt ensemble diversity after a drift is detected in order to learn new concept quickly and more accurately. Finally, the talk will introduce a new method for detecting both real and virtual drifts more accurately.

Biography:

Xin Yao is a Chair Professor of Computer Science at the Southern University of Science and Technology, Shenzhen, China, and a part-time Professor of Computer Science at the University of Birmingham, UK. His major research interests include evolutionary computation, ensemble learning and search-based software engineering. His work won the 2001 IEEE Donald G. Fink Prize Paper Award; 2010, 2015 and 2017 IEEE Transactions on Evolutionary Computation Outstanding Paper Awards; 2010 BT Gordon Radley Award for Best Author of Innovation (Finalist); 2011 IEEE Transactions on Neural Networks Outstanding Paper Award; and many other best paper awards. He received a prestigious Royal Society Wolfson Research Merit Award in 2012 and the IEEE CIS Evolutionary Computation Pioneer Award in 2013. He was recently selected to receive the 2020 IEEE Frank Rosenblatt Award.