Explore image similarity
Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (for example, for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging.
Leverage Encord Active's similarity search feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed.
Quick Tour
All the sections in the Quick Tour assume that you are already in a Project.
Tip
Choose any image in the Explorer workspace and click its Similar items !Similarity button button. This displays images/frames similar to the selected one, including any duplicates if they exist.
You can perform the search on up to 50 images/frames at the same time.
Explorer
The Explorer page has several areas that can help you find images of interest or duplicate images in your Project.
1: Refine and diversify your similarity search to find images of interest
There are times when performing a similarity search using only a single image is not enough. For example, you
want to perform a similarity search for historic buildings. The image you use to perform the search also has a large number of trees in the picture. Some of the images returned from your search might not be historic buildings at all. They might be images that have a large number of trees. To refine your search AND to add diversity to the returned results, select a number of images that are closer to what you want returned. Using the historic building example, select a number of different historic buildings in different lighting and different weather conditions. That way the results returned from your search are more in-line with what you want.
-
From the Explorer, filter, sort and search for 1 or more images/frames.
Tip
You can select up to 50 images.
-
Click Manage selection.
A dropdown menu appears. -
Select Similarity search.
Active displays similar images/frames of the selected images/frames.
2: Duplicates Shortcut
Found in the Overview tab, any images that have a Uniqueness
value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the Filter tab.
3: Sorting by `Uniqueness`
The entire Project can be sorted by Uniqueness
. Sort by ascending order to display duplicates first.
4: Filtering by `Uniqueness`
Filter the entire project using Uniqueness
.
Go to Filter tab > Add Filter > Data Quality Metrics > Uniqueness A small histogram diagram appears above the filter.
You can then change the filter settings to specify a range closer to 0.
Analytics
In a Project, go to the Analytics page and pick the Uniqueness
quality metric for the Metric Distribution section.
The chart displays the distribution of data based on the Uniqueness
scores.
Updated 22 days ago