Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (e.g., for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging.
Leverage Encord Active's similarity search feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed.
All of the sections in the Quick Tour assume that you are already in a Project.
Choose any image in the Explorer workspace and click its Similar items !Similarity button button. This displays images similar to the selected one, including any duplicates if they exist.
The Explorer page has three areas that can help you find duplicate images in your Project.
1: Duplicates Shortcut
Found in the Overview tab, any images that have a
Uniqueness value of 0 to 0.0001 are highlighted as duplicates. You can adjust this value from the Filter tab.
2: Sorting by `Uniqueness`
The entire Project can be sorted by
Uniqueness. Sort by ascending order to display duplicates first.
3: Filtering by `Uniqueness`
Filter the entire project using
Go to Filter tab > Add Filter > Data Quality Metrics > Uniqueness A small histogram diagram appears above the filter.
You can then change the filter settings to specify a range closer to 0.
In a Project, go to the Analytics page and pick the
Uniqueness quality metric for the Metric Distribution section.
The chart displays the distribution of data based on the
Updated about 1 month ago