graph LR
flair_data_DataPoint["flair.data.DataPoint"]
flair_data_Sentence["flair.data.Sentence"]
flair_data_Corpus["flair.data.Corpus"]
flair_tokenization_Tokenizer["flair.tokenization.Tokenizer"]
flair_data_Dictionary["flair.data.Dictionary"]
flair_embeddings_base_Embeddings["flair.embeddings.base.Embeddings"]
flair_nn_model_Model["flair.nn.model.Model"]
flair_trainers_trainer_ModelTrainer["flair.trainers.trainer.ModelTrainer"]
flair_data_Sentence -- "inherits from" --> flair_data_DataPoint
flair_embeddings_base_Embeddings -- "populates" --> flair_data_DataPoint
flair_data_Corpus -- "collects and manages" --> flair_data_Sentence
flair_tokenization_Tokenizer -- "processes raw text into" --> flair_data_Sentence
flair_trainers_trainer_ModelTrainer -- "consumes data from" --> flair_data_Corpus
flair_data_Corpus -- "builds" --> flair_data_Dictionary
flair_data_Sentence -- "uses" --> flair_tokenization_Tokenizer
flair_nn_model_Model -- "stores and utilizes" --> flair_tokenization_Tokenizer
flair_nn_model_Model -- "uses" --> flair_data_Dictionary
flair_nn_model_Model -- "composes" --> flair_embeddings_base_Embeddings
flair_trainers_trainer_ModelTrainer -- "orchestrates" --> flair_nn_model_Model
flair_nn_model_Model -- "consumes" --> flair_data_DataPoint
click flair_data_DataPoint href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//flair/flair_data_DataPoint.md" "Details"
click flair_data_Corpus href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//flair/flair_data_Corpus.md" "Details"
click flair_tokenization_Tokenizer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//flair/flair_tokenization_Tokenizer.md" "Details"
click flair_embeddings_base_Embeddings href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//flair/flair_embeddings_base_Embeddings.md" "Details"
click flair_nn_model_Model href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//flair/flair_nn_model_Model.md" "Details"
The architecture of flair is designed around a clear separation of concerns, with core components handling data representation, preprocessing, embedding, model definition, and training orchestration. Together, these components form a cohesive and modular architecture that allows Flair to efficiently handle various NLP tasks, from data loading and preprocessing to model training and prediction.
The foundational abstract base class for all data units in Flair (e.g., Token, Sentence). It defines the common interface for storing embeddings, managing various annotation layers, and providing basic textual and positional information, ensuring a consistent way to attach numerical representations and symbolic labels.
Related Classes/Methods:
A concrete implementation of DataPoint representing a sequence of tokens, typically a single sentence. It manages the raw text, a list of Token objects, and can hold various linguistic annotations (sentence-level labels, spans). It handles lazy tokenization, ensuring tokens are generated only when needed.
Related Classes/Methods:
The central container for managing datasets, typically split into training, development (dev), and testing sets. It provides methods for sampling, filtering, and generating Dictionary objects from the data, and ensures all necessary splits are available for a training run.
Related Classes/Methods:
An abstract base class that defines the contract for tokenizing raw text. Subclasses implement specific tokenization algorithms, providing the tokenize method to split a string into a list of string tokens. It's essential for preparing raw text for further processing.
Related Classes/Methods:
A utility class responsible for creating and managing a mapping between unique string items (like words, characters, or labels) and their corresponding integer IDs. This is crucial for converting symbolic data into numerical representations required by neural networks.
Related Classes/Methods:
The abstract base class for all embedding modules in Flair. It defines the embed method, which takes DataPoint objects (typically Sentence or Token) and populates them with dense vector representations, handling whether embeddings are static or need recomputation.
Related Classes/Methods:
The abstract base class for all neural network models in Flair that perform downstream NLP tasks (e.g., SequenceTagger, TextClassifier). It defines core functionalities required for training and inference, including label_type, forward_loss (for computing loss), and evaluate (for performance assessment). It also manages model saving/loading.
Related Classes/Methods:
The central orchestration component responsible for managing the entire training and evaluation lifecycle of a Flair model. It takes a flair.nn.Model and a flair.data.Corpus, handles the training loop (epochs, mini-batches), manages the optimizer and learning rate scheduler, and performs evaluations on development and test sets.
Related Classes/Methods: