Explore image similarity

Aligned Image with Page Break

Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (e.g., for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging.

Leverage Encord Active's similarity search feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed.

Quick Tour

All of the sections in the Quick Tour assume that you are already in a Project.

👍

Tip

Choose any image in the Explorer workspace and click its Similar items !Similarity button button. This displays images similar to the selected one, including any duplicates if they exist.

Explorer

The Explorer page has three areas that can help you find duplicate images in your Project.

1: Duplicates Shortcut

Found in the Overview tab, any images that have a Uniqueness value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the Filter tab.

Duplicates shortcut

2: Sorting by `Uniqueness`

The entire Project can be sorted by Uniqueness. Sort by ascending order to display duplicates first.

Sorting by `Uniqueness`

3: Filtering by `Uniqueness`

Filter the entire project using Uniqueness.

Go to Filter tab > Add Filter > Data Quality Metrics > Uniqueness A small histogram diagram appears above the filter.

You can then change the filter settings to specify a range closer to 0.

Filtering by `Uniqueness`

Analytics

In a Project, go to the Analytics page and pick the Uniqueness quality metric for the Metric Distribution section.

Distribution of data based on Uniqueness scores

The chart displays the distribution of data based on the Uniqueness scores.


What’s Next