Data di Pubblicazione:
2019
Abstract:
Classification, which means discrimination between examples belonging to different classes,
is a fundamental aspect of most scientific applications. Machine Learning (ML) tools have
proved to be very performing in this task, in the sense that they can achieve very high success
rates. On the other hand, the "realism" and interpretability of their models are very low,
resulting often in modest increases of knowledge and limited applicability. In this paper, a
methodology is described, which, by applying ML tools directly to the data, allows
formulating new scientific models that describe the actual "physics" determining the
boundary between the classes. The proposed technique consists of a stacked approach of
different ML tools, each one applied to a specific subtask of the scientific analysis; all
together they combine all the major strands of machine learning, from rule based classifiers
and Bayesian statistics to genetic programming and symbolic manipulation. To take into
account the error bars of the measurements, an essential aspect of any scientific form of
inference, the novel concept of the Geodesic Distance on Gaussian manifolds is adopted. The
characteristics of the methodology have been investigated with a series of systematic
numerical tests, for different types of classification problems. The potential of the approach
to handle real data has been tested with various experimental databases. The obtained results
indicate that the proposed method permits to find a good trade-off between accuracy of the
classification and complexity of the derived mathematical equations. Moreover, the derived
models can be tuned to reflect the actual phenomena, providing a very useful tool to bridge
the gap between data, machine learning tools and scientific theories.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Machine Learning; ML
Elenco autori:
Murari, Andrea
Link alla scheda completa:
Pubblicato in: