Label Text and HTML Files

Text files include formats such as .txt, .md, .rst, .xml, .html, .json, and more. However, HTML files are categorized slightly differently from other text file types. As a result, this documentation is divided into separate Text and HTML sections.

To learn how to import text and HTML see our documentation here.

The following types of labels can be applied to all text files, and at least one must be present in your project’s ontology to enable text labeling:

  • Text Region: Object labels applied to a specific region within the text file.
  • Classifications: Classification labels applied to the entire file.

Text

HTML

Encord supports both raw HTML files and single-extension HTML files. The key difference is that single-extension HTML files include all the necessary elements to render the webpage, such as CSS and JavaScript.

This video tutorial demonstrates how to label raw HTML. The process for labeling rendered HTML is identical.