Data quality metrics

Data quality metrics work on images or individual video frames.

Access Data Quality Metrics

Data Quality Metrics are used for sorting data, filtering data, and data analytics.

Title	Metric Type	Ontology Type
Area - Ranks images by their area (width/height).	`image`
Aspect Ratio - Ranks images by their aspect ratio (width/height).	`image`
Blue Value - Ranks images by how blue the average value of the image is.	`image`
Brightness - Ranks images by their brightness.	`image`
Contrast - Ranks images by their contrast.	`image`
Diversity - Forms clusters based on the ontology and ranks images from easy samples to annotate to hard samples to annotate.	`image`
Frame Number - Selects images based on a specified range.	`image`
Green Value - Ranks images by how green the average value of the image is.	`image`
Height - Ranks images by the height of the image.	`image`
Object Count - Counts number of objects in the image.	`image`	`bounding box`, `checklist`, `point`, `polygon`, `polyline`, `radio`, `rotatable bounding box`, `skeleton`, `text`
Object Density - Computes the percentage of image area that is occupied by objects.	`image`	`bounding box`, `polygon`, `rotatable bounding box`
Randomize Images - Assigns a random value between 0 and 1 to images.	`image`
Red Value - Ranks images by how red the average value of the image is.	`image`
Sharpness - Ranks images by their sharpness.	`image`
Uniqueness - Finds duplicate and near-duplicate images.	`image`
Width - Ranks images by the width of the image.	`image`

To access Data Quality Metrics for Explorer:

Click a Project from the Active home page.
Click Explorer.
Click Data.
Sort and filter the tabular data.
Click the plot diagram icon.
Sort and filter the embedding plot data.

To access Data Quality Metrics for analytics:

Click a Project from the Active home page.
Click Analytics.
Click Data.
Select the quality metric you want to view from the 2D Metrics view or Metrics Distribution graphs.

Area

Ranks images by their area. Area is computed as the product of image width and image height (width x height). Implementation on GitHub.

Aspect Ratio

Ranks images by their aspect ratio. Aspect ratio is computed as the ratio of image width to image height (width / height). Implementation on GitHub.

Blue Value

Ranks images by how blue the average value of the image is. Implementation on GitHub.

Brightness

Ranks images by their brightness. Brightness is computed as the average (normalized) pixel value across each image. Implementation on GitHub.

Contrast

Ranks images by their contrast. Contrast is computed as the standard deviation of the pixel values. Implementation on GitHub.

Diversity

For selecting the first samples to annotate when there are no labels in the project. Choosing simple samples that represent those classes well, gives better results. This metric ranks images from easy samples to annotate to hard samples to annotate. Easy samples have lower scores, while hard samples have higher scores.

Algorithm

K-means clustering is applied to image embeddings. The total number of clusters is obtained from the Ontology file (if there are both object and image-level information, total object classes are determined as the total cluster number). If no ontology information exists, K is determined as 10.
Samples for each cluster are ranked based on their proximity to cluster centers. Samples closer to the cluster centers refer to easy samples.
Different clusters are combined in a way that the result is ordered from easy to hard and the number of samples for each class is balanced for the first N samples.

Implementation on GitHub.

Frame Number

Select a range of images in a video or a sequential group of images.

Green Value

Ranks images by how green the average value of the image is. Implementation on GitHub.

Height

Ranks images by the height of the image. Implementation on GitHub.

Object Count

Counts number of objects in the image. Implementation on GitHub.

Object Density

Computes the percentage of image area that is occupied by objects. Implementation on GitHub.

Randomize Images

Uses a uniform distribution to generate a value between 0 and 1 for each image. Implementation on GitHub.

Red Value

Ranks images by how red the average value of the image is. Implementation on GitHub.

Sharpness

Ranks images by their sharpness. Sharpness is computed by applying a Laplacian filter to each image and computing the variance of the output. In short, the score computes “the amount of edges” in each image.

score = cv2.Laplacian(image, cv2.CV_64F).var()

Implementation on GitHub.

Uniqueness

This metric gives each image a score that shows each image’s uniqueness.

A score of zero means that the image has duplicates in the dataset; on the other hand, a score close to one represents that image is quite unique. Among the duplicate images, we only give a non-zero score to a single image, and the rest will have a score of zero (for example, if there are five identical images, only four will have a score of zero). This way, these duplicate samples can be easily tagged and removed from the project.
Images that are near duplicates of each other will be shown side by side.

Possible actions

To delete duplicate images: Set the quality filter to cover only zero values (that ends up with all the duplicate images), then use bulk tagging (for example, with a tag like Duplicate) to tag all images.
To mark duplicate images: Near-duplicate images are shown side by side. Navigate through these images and mark whichever is of interest to you.

Implementation on GitHub.

Width

Ranks images by the width of the image. Implementation on GitHub.

Get Started

General

Index

Annotate

Active

Other

Data quality metrics

Access Data Quality Metrics

Area

Aspect Ratio

Blue Value

Brightness

Contrast

Diversity

Algorithm

Frame Number

Green Value

Height

Object Count

Object Density

Randomize Images

Red Value

Sharpness

Uniqueness

Possible actions

Width

Get Started

General

Index

Annotate

Active

Other

​Access Data Quality Metrics

​Area

​Aspect Ratio

​Blue Value

​Brightness

​Contrast

​Diversity

​Algorithm

​Frame Number

​Green Value

​Height

​Object Count

​Object Density

​Randomize Images

​Red Value

​Sharpness

​Uniqueness

​Possible actions

​Width

Access Data Quality Metrics

Area

Aspect Ratio

Blue Value

Brightness

Contrast

Diversity

Algorithm

Frame Number

Green Value

Height

Object Count

Object Density

Randomize Images

Red Value

Sharpness

Uniqueness

Possible actions

Width