Pareto-Optimal Data Compression for Binary Classification Tasks | JoVE Visualize

Area of Science:

Information Theory
Machine Learning
Computer Vision

Background:

Lossy data compression aims to minimize storage costs while preserving essential information about specific attributes (Y) from a dataset (X).
This involves finding a mapping X → Z that maximizes mutual information I(Z, Y) under an entropy constraint H(Z).
Existing methods often struggle to efficiently map the trade-off between compression and information preservation.

Purpose of the Study:

To develop a novel method for mapping the Pareto frontier for classification tasks, balancing retained entropy and class information.
To present a technique for distilling data into a compressed representation that losslessly preserves class-discriminative information.
To generalize the discrete information bottleneck (DIB) problem and identify optimal compression points.

Main Methods:

A lossless mapping is proposed to distill data X from class Y into a lower-dimensional vector W, where I(W, Y) = I(X, Y).
For binary classification, W is further compressed into a discrete variable Z by binning, with parameter β controlling the compression level.
This process sweeps out the Pareto frontier, generalizing the DIB problem and identifying key 'corner' points.

Main Results:

The method successfully maps the Pareto frontier for classification, demonstrating the trade-off between compression and information.
Application to CIFAR-10, MNIST, and Fashion-MNIST datasets shows the approach acts as an information-theoretically optimal image clustering algorithm.
Pareto frontiers were found to be non-concave, and DIB phase transitions correspond to shifts between identified corner points.

Conclusions:

The proposed method provides an effective way to explore the information-theoretic limits of lossy compression for classification.
The identified 'corner' points offer a computationally efficient way to find optimal compression strategies without complex optimization.
The findings offer new insights into the behavior of DIB phase transitions and their relation to data clustering.