Exploring embedding plots

Enhance the active learning cycle with embedding plots

Encord Active incorporates embedding plots — a two-dimensional visualization technique employed to represent intricate, high-dimensional data in a more comprehensible and visually coherent manner. This technique reduces data dimensionality while preserving the inherent structure and patterns within the original data.

The embedding plot aids in identifying noteworthy clusters, gaining a deeper understanding of the data, performing weak labeling on images, and excluding undesirable images. Accessible on the Explorer page, the embedding plot is adaptable to data or labels based on the chosen option in the Order by drop-down.

Vibrant 2D data embedding plot highlighting data patterns and clusters

Notice how images are clustered around certain regions. By defining a rectangular area on the plot, users can quickly isolate and analyze data points within that defined region. This approach facilitates the exploration of commonalities among these samples.

Upon selecting a region, the content within the Explorer page will be adjusted accordingly. Various actions can be executed with the chosen group:

  • Utilize the tagging feature to mark them and posteriorly forward them for labeling.
  • Investigate the performance of the selected samples within the Predictions page.
  • Establish subsets similar to these and then conduct comparisons.

Samples within the data embedding plot lack label information, resulting in uniform coloration across all points. Conversely, data points in the label embedding plot are color-coded based on their respective label classes.

Vibrant 2D label embedding plot highlighting label patterns and clusters



The embedding plot is adaptable to data or labels based on the chosen option in the Order by drop-down. In addition to selecting points within a rectangular area, the label embedding plot offers the functionality to filter data points based on the label classes.

With the label embedding plot, users can:

  • Identify classes that are often confused with each other.
  • Detect samples with incorrect labeling, such as instances of a different class embedded within a larger cluster of another class.
  • Spot outliers and subsequently eliminate them from the dataset.