ITC-Net-blend-60: a comprehensive dataset for robust network traffic classification in diverse environments

  • 0Information Theory and Coding (ITC) Laboratory, University of Tehran, Tehran, Iran.

|

|

Summary

This summary is machine-generated.

This study introduces a diverse mobile network traffic dataset to improve mobile application recognition in encrypted traffic. The dataset enhances the robustness and real-world applicability of network traffic classifiers.

Area Of Science

  • Computer Science
  • Network Security
  • Data Science

Background

  • Mobile application recognition in encrypted network traffic is crucial for network administration, security, and digital marketing.
  • Existing datasets are limited as they originate from single network environments, hindering model evaluation for real-world robustness.
  • Developing adaptable network traffic classifiers for dynamic settings remains a significant challenge.

Purpose Of The Study

  • To address the limitations of current datasets by creating a more comprehensive and varied mobile network traffic dataset.
  • To facilitate the development and evaluation of robust mobile application recognition models.
  • To support research in network security and traffic analysis.

Main Methods

  • Collected traffic data from 60 popular Android applications across five distinct network scenarios.
  • Varied scenarios by changing Internet service provider (ISP), geographic location, device, application version, and user.
  • Captured traffic via real human interactions on physical devices, excluding background traffic and without requiring root access.

Main Results

  • The dataset contains over 48 million packets, 450,000 bidirectional flows, and 36 GB of data.
  • Traffic was generated from 60 applications under diverse network conditions.
  • Data collection method ensured practical applicability and avoided privileged access.

Conclusions

  • The newly created dataset offers a valuable resource for training and validating mobile application recognition models.
  • Its diversity across network conditions enhances the evaluation of classifier robustness and real-world performance.
  • This work contributes to advancing network security and intelligent network management through improved traffic analysis.

Related Concept Videos

Classification of Systems-I 01:26

179

Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:

Additivity means that the response to the sum of multiple inputs is the sum of their individual responses. For inputs x1(t) and x2(t) producing outputs y1(t) and y2(t), respectively:

Combining homogeneity and...

Aggregates Classification 01:29

317

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

Classification of Systems-II 01:31

139

Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,

Discrete-time systems have input and output signals at specific intervals, defined at distinct instants by difference equations. An example...

Classification of Signals 01:30

437

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Classification of Leukocytes 01:30

1.8K

Leukocytes are classified into two groups based on the presence or absence of cytoplasmic granules. Granular leukocytes, which contain granules, belong to the myeloid lineage and are divided into three subtypes: neutrophils, eosinophils, and basophils. These cells are roughly spherical and characterized by the granules in their cytoplasm.
Neutrophils are the most abundant type of granular leukocytes, comprising 50-70% of all leukocytes. They feature small, evenly distributed granules and a...

Force Classification 01:22

1.2K

Forces play a crucial role in the study of physics and engineering. They are essential in describing the motion, behavior, and equilibrium of objects in the physical world. Forces can be classified based on their origin, type, and direction of action.
Contact and non-contact forces are two of the most widely used categories of forces. As the name suggests, contact forces require physical contact between two objects to act upon each other. Examples of contact forces include frictional,...