Previous research

Variable Selection

Statisticians and data-miners are used to build predictive models and infer dependencies between variables on the basis of observed data. However, in a lot of emerging domains, like bioinformatics or textmining, they are facing datasets characterized by a very large number of features (up to several thousands), a large amount of noise, non-linear dependencies and, often, only several hundreds of samples. In this context, the detection of functional relationships as well as the design of effective classifiers is a major challenge.

We developed the double input symmetrical relevance (DISR). The rationale of this method is that a set of variables can return an information on the output class that is higher than the sum of the informations of each variable taken individually. This property results from variable interaction. Additionally DISR, is well suited to large datasets because of its low computational cost (LNCS 2006).

We showed that a variable selection approach based on DISR can be formulated as a quadratic optimization problem: the Dispersion Sum Problem (DSP). To solve this problem, we use a strategy based on Backward Elimination and Sequential Replacement (BESR). MASSIVE, the combination of DISR method with BESR is shown to be efficient compared to state-of-the-art feature selection methods (IEEE JSTSP 2008).
The importance of bringing causality into play when designing feature selection methods is more and more acknowledged in the machine learning community. we proposed a variant of DISR which aims to prioritise direct causal relationships in feature selection problems where the ratio between the number of features and the number of samples is high. This approach is based on the notion of interaction which is shown to be informative about the relevance of an input subset as well as its causal relationship with the target. The resulting filter is called mIMR (min-Interaction Max-Relevance) (ICML 2010).

Application: (Bioinformatics)
We applied our methods to microarray data. Variable selection applied to microarray data allows to identify a cell signature that can be used for diagnosis, i.e., differentiating malign tumor cells from benign ones, and also for prognosis, i.e., detecting tumor cells sensitive to treatment vs. tumor cells not responding to the treatment.

Information Theory

Variable selection and network inference, are subdomains of the data-mining field. However, few methods in these fields can deal with non-linearity together with large number of variables. We therefore needed to resort to more specific techniques. Information-theoretic methods offer an effective solution to these two issues. Our methods use mutual information, which is an information-theoretic measure of dependency. First, mutual information is a model-independent measure of information that has been used in data analysis for defining concepts like variable relevance, redundancy and interaction but also to redefine theoretic machine learning concepts. Secondly, mutual information captures non-linear dependencies. Finally, mutual information is rather fast to compute. Therefore, it can be computed a high number of times in a reasonable amount of time, as required by datasets having a large number of variables.

We introduced the infotheo R/C++ package in order to compute information-theoretic measures from a limited amount of samples. This package makes available a set of six information-theoretic measures (entropy, conditional entropy, mutual information, conditional mutual information, multiinformation and interaction information), four different entropy estimators (i.e. empirical, Miller-Madow, Schurmann-Grassberger and shrink) and three discretization methods (i.e. equal width, global equal width and equal frequencies binning).


Ants can increase their abilities by cooperation. This can be achieved through the combination of individual efforts within the framework of a collective behavior. Through multiple interaction in distributed systems, a new kind of intelligence ensues, i.e. a ``swarm intelligence''. Two noteworthy features of multi-agent systems should be underlined: 1.Robustness: the amount of entities render the system sensitiveless to individual dysfunctioning. 2.Flexibility (or adaptability): the amount of entities varies with the type of tasks. In this context, it would be useful to have several robots able to adapt their pulling forces and adjust the number of participants to retrieve to their base different objects with different sizes and different loads.

We developed an experimental framework of a collective pulling of one object by three robots and we showed that although a solution can be found by using random trials in the direction of the nest, a much better coordination ensues from a minimal number of agents with stigmergic communication (STAIRS-ECAI 2004).

Go to current research.