Task Agents (SDK)

TLDR;

Do you already know what you are doing and only want to look over a Jupyter Notebook example? We provide one here.

Task Agents (https://app.encord.com/agents/task-agents or https://app.us.encord.com/agents/task-agents) are Workflow components in which a custom operation on all tasks in the Agent stage can be triggered. This allows you to set up pre-labeling, such as using foundation models such as GPT-4o, automated quality assurance, or any other custom action you need for your workflow.

Tasks in the Agent stage are processed in the order of their task priority.

Adding Task Agents to your Workflow

To add Task Agents to your Workflow you must add an Agent stage component to your Workflow.

Connect the Agent stage in the required place in your Workflow.
Click + next to Add pathway and give the pathway a meaningful name. Repeat this step for each additional pathway you want to add. Pathways allow you to connect the Agent stage to other workflow components.

Coming Soon! We’re currently evaluating webhooks for the Agent stage. Please let us know if this would help your use-case by reaching out to support@encord.com.

Configuring Task Agents

Use the Encord SDK to configure your Task Agent. The Task Agent executes the configured SDK script for all tasks that are routed through the Task Agent stage in your Workflow.

General Example

The General Example script shows how to configure a Task Agent with the name Agent 1 and with a pathway called continue to Review.

Agent nodes can be used in conjunction with bundles to efficiently handle bulk actions

Ensure that you:

Replace <private_key_path> with the path to your private key.
Replace <project_hash> with the hash of your Project.
Insert your custom logic where the comment instructs you to do so.

General Example

# Import dependencies
from encord.user_client import EncordUserClient
from encord.workflow import AgentStage

# Authenticate using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)

# Specify the Project that contains the Task agent. Replace <project_hash> with the hash of your Project
project = user_client.get_project(<project_hash>)

# Specify the Task Agent
agent_stage = project.workflow.get_stage(name="Agent 1", type_=AgentStage)
for task in agent_stage.get_tasks():

    # Now you have the agent task containing the data hash
    # Insert you custom logic here

# When the custom logic is completed, the task can be moved forward to the selected pathway
task.proceed(pathway_name="continue to Review")

Pre-Classification of Images Using GPT 4o

See our end-to-end guide for Pre-Classification using GPT 4o for more detailed information.

The Pre-Classification script uses GPT 4o mini to route images to different annotation stages depending on what it is contained in the image. The Pre-Classification script applies to the following Workflow.

Agent nodes can be used in conjunction with bundles to efficiently handle bulk actions

In the following script:

Replace <private_key_path> with the hash of your private key.
Replace <project_hash> with the hash of your Project.
Replace Agent 1 with the name of your Agent stage.

# Import dependencies
from encord.user_client import EncordUserClient
from encord.workflow import AgentStage
import openai
import base64
import requests
import json

# Initialize your OpenAI client
openai.api_key = "<your_openai_api_key>"

def get_classification_from_the_model(media_content):
    """
    Example function that passes media to OpenAI's ChatGPT API along with the prompt
    and parses the result.
    """
    prompt = """
    You are an image analysis expert. You're working on a project that includes annotation of different pets images.
    Your task is to assign one of the following tags to the image: "cat", "dog", "other".

    Reply in JSON format of the following structure: { "classification": cat|dog|other }
    """

    completion = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[
            ChatCompletionSystemMessageParam(role="system", content=prompt),
            ChatCompletionUserMessageParam(
                role="user",
                content=[
                    ChatCompletionContentPartImageParam(
                        image_url=ImageURL(url=f"data:image/jpeg;base64,{media_content}", detail="auto"),
                        type="image_url",
                    )
                ]
            ),
        ],
        response_format=ResponseFormat(type="json_object"),
        max_tokens=1000,
    )

    raw_text_completion = completion.choices[0].message.content
    try:
        parsed_result = json.loads(raw_text_completion)
        return parsed_result["classification"].lower()
    except Exception as e:
        print(f"Failed to process the model response: {e}")
        return None


# Authenticate using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

# Specify the Project that contains the Task agent. Replace <project_hash> with the hash of your Project
project = user_client.get_project("<project_hash>")

radio_classification = project.ontology_structure.get_child_by_title(
title="Animal",
type_=Classification,
)

cat_option = radio_ontology_classification.get_child_by_title(
title="Cat", type_=Option
)

dog_option = radio_ontology_classification.get_child_by_title(
title="Dog", type_=Option
)

# Specify the Task Agent
agent_stage = project.workflow.get_stage(name="Agent 1", type_=AgentStage)

for task in agent_stage.get_tasks():
    # Got a task for the following data unit
    print(f"{task.data_hash} -> {task.data_title}")

    # Getting a label row for the data unit
    label_row = project.list_label_rows_v2(data_hashes=[task.data_hash])[0]
    label_row.initialise_labels(include_signed_url=True)

    # Downloading the media:
    media_response = requests.get(label_row.data_link)
    media_content = base64.b64encode(media_response.content).decode("utf-8")

    # Now we can send the media to OpenAI:
    model_response = get_classification_from_the_model(media_content)

    # And interpret the result:
    match model_response:
        case "cat":

            # Create a classification instance
            classification_instance = (
                radio_ontology_classification.create_instance()
            )

            radio_classification_instance.set_answer(
                answer=cat_option
            )
            
            label_row.add_classification_instance(radio_classification_instance)
            label_row.save()

            task.proceed(pathway_name="Cat")
        case "dog":

            # Create & save classification instance
            classification_instance = (
                radio_ontology_classification.create_instance()
            )
        
            radio_classification_instance.set_answer(
                answer=dog_option
            )

            
            label_row.add_classification_instance(radio_classification_instance)
            label_row.save()

            task.proceed(pathway_name="Dog")
        case _:
            task.proceed(pathway_name="Other")

Pre-Labeling Videos Using a Mock Model

This guide makes the following assumptions:

You have a model that takes video frames as an input and provides bounding box coordinates and confidence scores as an output.
You have installed the encord-agents library using the following command:

python -m pip install encord-agents

The Pre-Labeling Script selects a random class from the Ontology, generates random bounding box labels, and applies random confidence scores for video frames before advancing the videos to the annotation stage (Annotate 1). Below is an example of a Workflow where the Pre-Labeling agent can be effectively utilized.

To authenticate, you must set either of following environment variables in the environment that you plan to run your agents.

ENCORD_SSH_KEY: Containing the raw private key file content
ENCORD_SSH_KEY_FILE: Containing the absolute path to the private key file

In the following script:

Replace <project_hash> with the hash of your Project.
Replace the mock model with your own model, and adapt the rest of the script according to your needs.
If you choose to give your python file a different name, ensure you replace all references to prelabel_video.py with your new file name.

prelabel_video.py

#Import Dependencies
import random
from dataclasses import dataclass
from typing import Iterable

import numpy as np
from encord.objects.coordinates import BoundingBoxCoordinates
from encord.objects.ontology_labels_impl import LabelRowV2
from encord.project import Project
from encord_agents.core.data_model import Frame
from encord_agents.tasks import Depends, Runner
from encord_agents.tasks.dependencies import dep_video_iterator
from numpy.typing import NDArray
from typing_extensions import Annotated


# Set the Environment variable to authenticate with Encord
ENCORD_SSH_KEY_FILE="/path/to/file/" prelabel_video.py

runner = Runner(project_hash="<project_hash>")

# === BEGIN MOCK MODEL === #
@dataclass
class ModelPrediction:
    label: int
    coords: BoundingBoxCoordinates
    conf: float


def fake_predict(image: NDArray[np.uint8]) -> list[ModelPrediction]:
    return [
        ModelPrediction(
            label=random.choice(range(3)),
            coords=BoundingBoxCoordinates(
                top_left_x=random.random() * 0.5,
                top_left_y=random.random() * 0.5,
                width=random.random() * 0.5,
                height=random.random() * 0.5,
            ),
            conf=random.random() + 0.5,
        )
        for _ in range(10)
    ]


model = fake_predict
# === END MOCK MODEL === #

@runner.stage(stage="Pre-Labeling Agent")
def run_something(
    lr: LabelRowV2,
    project: Project,
    frames: Annotated[Iterable[Frame], Depends(dep_video_iterator)],
) -> str:
    ontology = project.ontology_structure

    # Create an object instance for each frame in the video and save the labels. 
    for frame in frames:
        outputs = model(frame.content)
        for output in outputs:
            ins = ontology.objects[output.label].create_instance()
            ins.set_for_frames(
                frames=frame.frame, coordinates=output.coords, confidence=output.conf
            )

            lr.add_object_instance(ins)

    lr.save()

    # Return the name of the stage the task should progress to
    return "Annotate 1"  


if __name__ == "__main__":
    runner.run()

Triggering the Task Agent

Webhooks are coming soon, allowing you to trigger Task Agents manually.

Task Agents aggregate all tasks that reach the Agent stage in the workflow. Your custom script must be triggered at this stage before the tasks proceed further in the workflow.

End-to-End Agent Example

We provide end-to-end examples using Jupyter Notebooks here.

Get Started

General

Index

Ontologies

Projects

Labels

Datasets

DICOM

Active API & SDK

Task Agents (SDK)

TLDR;

Adding Task Agents to your Workflow

Configuring Task Agents

Triggering the Task Agent

End-to-End Agent Example

Get Started

General

Index

Ontologies

Projects

Labels

Datasets

DICOM

Active API & SDK

​TLDR;

​Adding Task Agents to your Workflow

​Configuring Task Agents

​Triggering the Task Agent

​End-to-End Agent Example

TLDR;

Adding Task Agents to your Workflow

Configuring Task Agents

Triggering the Task Agent

End-to-End Agent Example