Frequently, when distinctive characteristics arise within a dataset, identifying similar images becomes crucial (for example, for relabeling or removal). Detecting these instances assists in assessing the thoroughness of data representation and the accuracy of labels, particularly in situations where certain classes may be underrepresented or labels could be inaccurately assigned. As datasets expand, manual identification of such cases becomes progressively challenging.

Leverage Encord Active’s similarity search feature to effortlessly locate semantically akin images in your dataset. Upon identifying an edge case or duplicate, applying tags and executing actions such as relabeling or deletion can be performed.

Quick Tour

All the sections in the Quick Tour assume that you are already in a Project.

Choose any image in the Explorer workspace and click its Similar items button. This displays images/frames similar to the selected one, including any duplicates if they exist.

You can perform the search on up to 50 images/frames at the same time.

Explorer

The Explorer page has several areas that can help you find images of interest or duplicate images in your Project.

Analytics

  1. In a Project, go to the Analytics page.

  1. Select or add a Distribution & Summary statistics chart.

  2. Select the Uniqueness quality metric for the distribution metric.

  3. Adjust the Bucket value for data as required.

    The chart displays the distribution of data based on the Uniqueness scores.