Initialising from Image Directory
If you have an image dataset stored on your local system already, you can initialise a project from that dataset with the init
command.
The command will automatically run all the existing metrics on your data.
The main argument is the root of your dataset directory.
encord-active init /path/to/dataset
If you add the --dryrun
option:
# after command 👇 before the path
encord-active init --dryrun /path/to/dataset
No project will be initialised but all the files that would be included will be listed. You can use this for making sure that Encord Active will include what you expect before starting the actual initialisation.
You have some additional options to tailor the initialisation of the project for your needs.
Including Labels​
If you want to include labels as well, this is also an option.
To do so, you will have to define how to parse your labels.
You do this by implementing the LabelTransformer
interface.
from pathlib import Path
from typing import List
from encord_active.lib.labels.label_transformer import (
BoundingBoxLabel,
ClassificationLabel,
DataLabel,
LabelTransformer,
PolygonLabel
)
class MyTransformer(LabelTransformer):
def from_custom_labels(self, label_files: List[Path], data_files: List[Path]) -> List[DataLabel]:
# your implementation goes here
...
Here is an example of inferring classifications from the file structure of the images. Let's say you have your images stored in the following structure:
/path/to/data_root
├── cat
│  ├── 0.jpg
│  └── ...
├── dog
│  ├── 0.jpg
│  └── ...
└── horse
  ├── 0.jpg
  └── ...
Your implementation would look similar to:
# classification_transformer.py
from pathlib import Path
from typing import List
from encord_active.lib.labels.label_transformer import (
ClassificationLabel,
DataLabel,
LabelTransformer,
)
class ClassificationTransformer(LabelTransformer):
def from_custom_labels(self, _, data_files: List[Path]) -> List[DataLabel]:
return [DataLabel(f, ClassificationLabel(class_=f.parent.name)) for f in data_files]
And the CLI command:
encord-active init --transformer classification_transformer.py /path/to/data_root
More concrete examples for, e.g., bounding boxes and polygons, are included in our example directory on GitHub.
CLI Options​
--data-glob
(or -dg
)​
Default:
"**/*.jpg"
,"**/*.jpeg"
,"**/*.png"
,"**/*.tiff"
Glob patterns used to choose data files, i.e., images. You can specify multiple options if you wish to include files from specific subdirectories. For example,
encord-active init -dg "val/*.jpg" -dg "test/*.jpg" /path/to/dataset
would match jpg
files in the val
and test
directories but not in the train
directory of the following file structure:
/path/to/dataset
├── train
│  ├── 0.jpg
│  └── ...
├── val
│  ├── 0.jpg
│  └── ...
└── test
  ├── 0.jpg
  └── ...
The matched files will be passed to the LabelTransformer.from_custom_labels
method for you to parse and transform into DataLabel
objects -- if you use the --transformer
option.
--label-glob
(or -lg
)​
Default:
None
Glob patterns used to choose data files, i.e., files that contain label information. You can specify multiple options if you wish to include files from specific subdirectories.
For example,
encord-active init -lg "val/*.txt" -lg "test/*.txt" /path/to/dataset
would match txt
files in the val
and test
directories but not in the train
directory of the following file structure:
/path/to/dataset
├── train
│  ├── 0.jpg
│  ├── 0.txt
│  └── ...
├── val
│  ├── 0.jpg
│  ├── 0.txt
│  └── ...
└── test
  ├── 0.jpg
  ├── 0.txt
  └── ...
The files will be passed to the LabelTransformer.from_custom_labels
method for you to parse and transform into DataLabel
objects.
--target
(or -t
)​
Default: the current working directory
Directory where the project would be saved.
The project will be stored in a directory within the target directory (see the --name
option for details).
--name
(or -n
)​
Default: the name of the data directory prepended with
"[EA] "
Name to give the new Encord Active project directory.
For example, if you run
encord-active init --target foo/bar --name baz /path/to/dataset
A new directory named baz
will be generated in the foo/bar
directory containing the project files.
--symlinks
​
Default:
False
Use symlinks instead of copying images to the Encord Active project directory. This will save you a lot of space on your system.
If you later move the data that the symlinks are pointing to, the Encord Active project will stop working - unless you manually update the symlinks.
--transformer
​
Default:
None
A file path to a python implementation of the LabelTransformer
interface.
This file can contain one or more implementations, and you will be able to choose interactively from the UI which one you would like to apply during your project initialization.
To see some reference implementations, please visit our GitHub examples directory.
--dryrun
​
Default:
False
Print the files that will be imported WITHOUT actually creating a project. This option is helpful if you want to validate that you are actually importing the correct data.