Learn how to set up the components of your active learning process in Encord

ℹ️

Note

Active learning workflows in the Encord platform are specifically designed for workflow projects. This requirement allows for seamless task movement between essential stages such as label, review and complete when utilizing the SDK.

Active learning workflows in Encord Active share the following key stages:

  1. Initialization.
  2. Prioritizing high-value data to label.
  3. Model training and update.

If you prefer to witness an active learning workflow in action, take a look at the end-to-end tutorial for MNIST.

Initialization

To start an active learning workflow, you need an initial labeled dataset for training the machine learning model. In the Encord platform, this corresponds to having a project with annotations.

If you do not have any Projects yet, watch the tutorial video on setting up a workflow Project to get started quickly.

Choose an Encord project

To proceed, you should pull the project into Encord Active. Execute the following CLI command and remember to acknowledge that you would like to include uninitialized label rows, as they represent unannotated data.

encord-active import project

If you require detailed information on the options available during the import process, you can refer to the Import from Encord platform guide.

If your workflow project already contains annotations, you can proceed directly to Model training and update.

Prioritize high value data to label

If your project does not have any annotations, or you are seeking the most appropriate data for labeling, it's essential to score and rank your data.
While random selection is a possibility, Encord Active provides metrics such as Image Diversity to enhance and optimize annotation impact.
This metric ranks images based on their ease of annotation, enabling prioritization of suitable and manageable data.

👍

Tip

Check out the quality metrics page for a comprehensive overview of available metrics in Encord Active, including the acquisition functions used for sample selection.

For example, you can follow these steps to prioritize labeling for data with the lowest Image Diversity score using the UI:

  1. In the Data Quality explorer page, navigate to the toolbox and click on the Filter tab.
  2. Select the option that correspond to the first labeling stage (usually named Annotate 1) under the Workflow Stage metadata filter to pick the unannotated data.
  3. Add the Image Diversity filter and adjust the slider to select a subset of data with the lowest score.
  4. Access the Action tab in the toolbox.
  5. Click on the 🖋 Relabel button and follow the instructions to prioritize labeling for the selected data.

ℹ️

Note

Task prioritization for labeling is currently a closed-beta feature in the Encord platform. To learn more about this feature, reach out to us on Slack or via email.

Nevertheless, to mimic the behavior of task prioritization in projects with only single images, you can follow these steps:

  1. In the Filter tab, select the option that correspond to the first labeling stage (usually named Annotate 1) under the Workflow Stage metadata filter to pick the data ready to be labeled.
  2. Use the bulk tagging feature to mark them with a data tag, such as unlabeled.
  3. Add the Image Diversity filter and adjust the slider to select a subset of filtered data with the lowest score.
  4. Use the bulk tagging feature to mark this further selection with a data tag, such as to label next.
  5. Reset the filters and choose the unlabeled tag option under Tags.
  6. Access the Action tab in the toolbox and click on the ✅ Mark as Complete button and follow the instructions to temporarily move all the selected data to the workflow's Complete stage.
  7. Return to the Filter tab, reset the filters and choose the to label next tag option under Tags.
  8. Access the Action tab in the toolbox again, click on the 🖋 Relabel button and follow the instructions to move the selected data to the workflow's first annotation stage.
  9. Once the selected data has been labeled, use the following filter combination to bring back the remaining data from the Complete stage to the first labeling stage as in step (8):
    Select the No class option under Object Class and choose the proper tag name (e.g. unlabeled) option under Tags.

By following these steps, you can ensure that the first labeling stage contains only the prioritized data for labeling, and the task states align at the end with the flow that utilizes the task prioritization feature.

Model training and update

In the active learning workflow, model training plays a crucial role. It involves training a machine learning model using the initial labeled dataset and iteratively updating it with newly labeled data. Encord Active provides support for a wide range of models by allowing you to plug in your own model and interface with it using convenient wrappers.

More information can be found in this here.