Explore image similarity

Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (for example, for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging.

Leverage Encord Active's similarity search feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed.

Quick Tour

All the sections in the Quick Tour assume that you are already in a Project.

👍

Tip

Choose any image in the Explorer workspace and click its Similar items !Similarity button button. This displays images/frames similar to the selected one, including any duplicates if they exist.

You can perform the search on up to 50 images/frames at the same time.

Explorer

The Explorer page has several areas that can help you find images of interest or duplicate images in your Project.

1: Refine and diversify your similarity search to find images of interest

There are times when performing a similarity search using only a single image is not enough. For example, you
want to perform a similarity search for historic buildings. The image you use to perform the search also has a large number of trees in the picture. Some of the images returned from your search might not be historic buildings at all. They might be images that have a large number of trees. To refine your search AND to add diversity to the returned results, select a number of images that are closer to what you want returned. Using the historic building example, select a number of different historic buildings in different lighting and different weather conditions. That way the results returned from your search are more in-line with what you want.

  1. From the Explorer, filter, sort and search for 1 or more images/frames.

    👍

    Tip

    You can select up to 50 images.

    Similarity search on multiple images/frames

  2. Click Manage selection.
    A dropdown menu appears.

  3. Select Similarity search.
    Active displays similar images/frames of the selected images/frames.

2: Duplicates Shortcut

Found in the Overview tab, any images that have a Uniqueness value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the Filter tab.

Duplicates shortcut

3: Sorting by `Uniqueness`

The entire Project can be sorted by Uniqueness. Sort by ascending order to display duplicates first.

Sorting by `Uniqueness`

4: Filtering by `Uniqueness`

Filter the entire project using Uniqueness.

Go to Filter tab > Add Filter > Data Quality Metrics > Uniqueness A small histogram diagram appears above the filter.

You can then change the filter settings to specify a range closer to 0.

Filtering by `Uniqueness`

Analytics

In a Project, go to the Analytics page and pick the Uniqueness quality metric for the Metric Distribution section.

Distribution of data based on Uniqueness scores

The chart displays the distribution of data based on the Uniqueness scores.


What’s Next