May 14, 3:50 to 5:50, room 2

Cutting Heuristics in Computational Intelligence with Visual Data Mining

Boris Kovalerchuk, Dept. of Computer Science, Central Washington University, USA

A common approach for developing learning predictive and optimization models for real-world tasks is selecting a class of mathematical models, and then identifying parameters of models using available data. The prior knowledge of the field, task and data typically is uncertain, insufficient or confusing to make this selection on the solid scientific basis. As a result selection of the class of high-dimensional models is rather an art of guess or heuristics than science. One of the major difficulties for scientifically sound selection of the class of models is that we cannot see the structure of data in multidimensional space by a naked eye, which is critical for identifying the model class.

In contrast we often successful in model selection with 2-D or 3-D data that we can observe with a naked eye. Thus in the multidimensional case we are in essence guessing the class of models in advance, e.g., neural networks, SVM, Bayesian networks, linear regression, decision trees, lineal discrimination, linear programming, and so on. The guesswork is not limited by selecting a class of models but also selecting a set of internal components within each class, e.g., selecting the number of hidden layers in Neural Networks, a type of kernel functions in SVM, k in the k-Nearest Neighbors method, the procedure to choose the next splitting attribute in Decision Trees and so on. The guesswork in the model class selection is a long-time open problem that can be traced to Ptolemy. Today the guesswork often is equivalent to using methods that are simply available in some handy tools or learned before.

The overall goal of this tutorial is to present a body of studies to cut the guesswork and make the selection of the predictive model more scientifically rigorous, task effective and faster. Specifically we focus on supervised and unsupervised learning tasks of classification and clustering of n-D data using lossless visual representation of n-D data as graphs. These lossless displays are important because of abilities: (1) to restore all attributes of each n-D data point from these graphs, (2) to leverage the unique power of human vision to compare in parallel the hundreds of their features, and (3) to speed up the selection of an appropriate class of models. These representations will be reviewed and compared with lossy representations such as Principal Components Analysis (PCA) and Multidimensional Scaling (MDS), manifolds and others.

We will present lossless visual data mining and the benefits of using it together with common computational intelligence approaches. The tutorial will show the need to go beyond traditional model selection methodology, and that Visual Data Mining (VDM) is a powerful approach to meet this fundamental challenge in computational learning. The tutorial will present how Visual Data Mining actually helps to cut guesses by generalizing its success with 2-D data. It will be illustrated with case studies and simulation examples. It will be shown how this VDM uses human abilities to detect some structures visually in specific visual representations of n-D data in 2-D in accordance with Gestalt laws of human visual perception. The tutorial also will discuss the current challenges and open problems in this VDM inviting the community to work on them. The success in exploiting these abilities in visual feature recognition depends on multiple mathematical and physiological factors. At the full scope such modeling is modelling human vision and mind. Thus, this tutorial compliments the tutorial by Dr. Perlovsky.

Contents of the tutorial:

Model class justification vs. model class heuristic guess;

Lossy vs. Lossless n-D data representations for cutting heuristics;

Review of lossy representations;

Review of lossless representations;

Mathematical properties;

Visual discovery algorithms and software for cutting heuristics;

Case studies;

Open problems.

Bio: Dr. Boris Kovalerchuk (http://www.cwu.edu/~borisk) is a professor of Computer Science at Central Washington University, USA. He is co-author of two books on Data Mining (Kluwer, 2000) and Visual and Spatial Analysis (Springer 2005) and over 150 other publications including a chapter in the Data Mining Handbook. His research interests are in data mining, machine learning, uncertainty modeling, relationships between probability theory and fuzzy logic, data fusion, visual analytics, image and signal processing. He has been a principal investigator of several research projects in these areas supported by the US Government agencies. Dr. Kovalerchuk served as a senior visiting scientist at the US Air Force Research Laboratory and as a member of several expert panels at the international conferences including WCCI and panels organized by the US Government bodies.