Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Encoding

Encoding

Information enters the brain through encoding, which is the input of information into the memory system. Once sensory information is received from the environment, the brain labels or codes it. The information is then organized with similar information and connected to existing concepts. Encoding occurs through automatic processing and effortful processing.
Automatic processing involves the encoding of details like time, space, frequency, and the meaning of words, usually done without conscious...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Chunking

Chunking

Chunking is a powerful cognitive technique that improves short-term memory retention by organizing information into smaller, more manageable units. The brain, limited by working memory capacity, can more easily process and store information when it is divided into "chunks" rather than presented as discrete, unrelated elements. Chunking is especially useful when dealing with large amounts of information, such as numerical sequences, words, or complex ideas.
The principle behind chunking...

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

Data: Types and Distribution

Data: Types and Distribution

In biostatistics, data are the observations collected for analysis. There are two main types: parametric and non-parametric. Parametric data, which include continuous (e.g., weight) and discrete numerical data (e.g., number of tablets), assume a particular distribution pattern, often the normal distribution. Non-parametric data do not adhere to a specific distribution and typically comprise nominal (e.g., gender) and ordinal categorical data (e.g., pain scale ratings).
Distributions in...

Multiple Allele Traits

Multiple Allele Traits

The Concept of Multiple Allelism

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Polynomial Perceptrons for Compact, Robust, and Interpretable Machine Learning Models.

Entropy (Basel, Switzerland)·2026

Same author

A Blockchain and Fingerprinting Traceability Method for Digital Product Lifecycle Management.

Sensors (Basel, Switzerland)·2022

Same author

An Indoor Navigation Methodology for Mobile Devices by Integrating Augmented Reality and Semantic Web.

Sensors (Basel, Switzerland)·2021

Same author

A WoT-Based Method for Creating Digital Sentinel Twins of IoT Devices.

Sensors (Basel, Switzerland)·2021

Same author

Distributed Algorithm for Base Station Assignment in 4G/5G Machine-Type Communication Scenarios with Backhaul Limited Conditions.

Sensors (Basel, Switzerland)·2020

Same author

Smartphone-Based Platform for Secure Multi-Hop Message Dissemination in VANETs.

Sensors (Basel, Switzerland)·2020

Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026

Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026

Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026

Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026

Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026

Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 25, 2025

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning.

Ivan Lopez-Arevalo¹, Edwin Aldana-Bobadilla², Alejandro Molina-Villegas³

¹Centro de Investigación y de Estudios Avanzados del I.P.N., Unidad Tamaulipas, Victoria 87130, Mexico.

Entropy (Basel, Switzerland)

|December 15, 2020

Summary

This summary is machine-generated.

This study introduces a new method for encoding mixed-type data in machine learning, outperforming traditional techniques like one-hot encoding. The novel approach efficiently handles diverse datasets, improving memory usage and preserving valuable information.

Keywords:

categorical data data preprocessing machine learning

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Related Experiment Videos

Last Updated: Nov 25, 2025

Decoding Natural Behavior from Neuroethological Embedding

Decoding Natural Behavior from Neuroethological Embedding

Published on: October 3, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Author Spotlight: Advancing Large-Scale Neural Dynamics Through HD-MEA Technology

Published on: March 8, 2024

Area of Science:

Machine Learning
Data Science
Information Theory

Background:

Machine learning often struggles with datasets containing both numerical and categorical data.
Existing encoding methods like one-hot and feature hashing increase dataset dimensionality, leading to challenges with excessive variables and noisy data.

Purpose of the Study:

To propose a novel encoding approach for mixed-type data that addresses the limitations of current methods.
To map mixed-type data into an information space using Shannon's Theory to model information content.

Main Methods:

Developed a new encoding technique based on Shannon's Theory to represent information in mixed-type data.
Evaluated the proposed method on ten UCI repository datasets and two real-world datasets.
Applied the encoding for classification, regression, and clustering tasks.

Main Results:

The novel encoding approach demonstrated promising results across various datasets and tasks.
Achieved superior memory efficiency compared to one-hot and feature-hashing encoding.
Successfully preserved the information content of the original mixed-type data.

Conclusions:

The proposed encoding method offers a significant improvement over traditional techniques for handling mixed-type data.
This approach enhances memory efficiency and information preservation in machine learning preprocessing.
The method is effective for classification, regression, and clustering tasks.