> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Gen AI Data Lifecycle

> Design an end-to-end Gen AI data lifecycle that supports grounding, alignment, and continuous improvement.

# Gen AI data lifecycle

Gen AI systems improve through **tight feedback loops**, not one-off training runs. This page outlines a proven lifecycle for building reliable Gen AI pipelines.

***

## 1. Ingest unstructured data

Start by centralizing all relevant sources:

* Documents (PDFs, HTML, knowledge bases)
* Text datasets
* Audio transcripts
* Images and multimodal assets
* Metadata describing source, freshness, and trust

**Recommended docs**

* [Files](/platform-documentation/Curate/index-files)
* [Supported Data](/platform-documentation/General/general-supported-data)
* [Custom Metadata](/platform-documentation/Curate/custom-metadata/index-metadata-schema)

***

## 2. Curate for grounding and quality

Not all data should be used for retrieval or training.

Curation focuses on:

* Removing duplicates and low-signal data
* Identifying hallucination-prone sources
* Grouping content by domain or intent
* Selecting data for targeted evaluation

**Recommended docs**

* [Getting Started with Index](/platform-documentation/Curate/index-getting-started)
* [Embedding Plots](/platform-documentation/Curate/embedding-plots)
* [Collections](/platform-documentation/Curate/curation-basics)

***

## 3. Annotate feedback and intent

Human feedback is central to Gen AI alignment:

* Classification (correct / incorrect / unsafe)
* Ranking and preference selection
* Structured explanations
* Instruction-following evaluation

**Recommended docs**

* [Ontologies](/platform-documentation/Annotate/annotate-ontologies/annotate-ontologies)
* [Annotate & Review](/platform-documentation/Annotate/annotate-label-editor/annotate-label-editor-annotate)
* [Create a Project](/platform-documentation/GettingStarted/gettingstarted-create-project)

***

## 4. Evaluate model behavior

Evaluation should be continuous and comparative:

* Prompt-level performance
* Dataset-level trends
* Model-to-model comparisons
* Regression detection

**Recommended docs**

* [Model Evaluation](/platform-documentation/Validation/active-how-to/active-model-predictions-eval)
* [Quality Metrics](/platform-documentation/Validation/label-validation-basics#quality-metrics)
* [Analytics View](/platform-documentation/Validation/label-validation-basics#analytics-view)

***

## 5. Close the feedback loop

Evaluation insights drive the next cycle:

* Re-curate data
* Expand edge-case coverage
* Refine feedback schemas
* Update prompts or retrieval sources

This loop repeats as models and requirements evolve.

***

## Key takeaway

Reliable Gen AI is not a single model — it’s a **living system**:

> Curate → Evaluate → Feedback → Improve → Repeat
