graph LR
Eager_Execution_Interceptor["Eager Execution Interceptor"]
Tensor_Abstraction_Layer["Tensor Abstraction Layer"]
Lazy_Computational_Graph_Engine["Lazy Computational Graph Engine"]
Computational_Graph_Pre_processor["Computational Graph Pre-processor"]
GPU_Memory_Optimizer["GPU Memory Optimizer"]
Eager_Execution_Interceptor -- "passes wrapped tensors to" --> Lazy_Computational_Graph_Engine
Tensor_Abstraction_Layer -- "provides interface for" --> Lazy_Computational_Graph_Engine
Lazy_Computational_Graph_Engine -- "uses interface from" --> Tensor_Abstraction_Layer
Lazy_Computational_Graph_Engine -- "requests analysis from" --> Computational_Graph_Pre_processor
Computational_Graph_Pre_processor -- "returns metadata to" --> Lazy_Computational_Graph_Engine
Lazy_Computational_Graph_Engine -- "consults" --> GPU_Memory_Optimizer
GPU_Memory_Optimizer -- "provides strategy to" --> Lazy_Computational_Graph_Engine
click Eager_Execution_Interceptor href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/koila/Eager_Execution_Interceptor.md" "Details"
click Tensor_Abstraction_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/koila/Tensor_Abstraction_Layer.md" "Details"
click Lazy_Computational_Graph_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/koila/Lazy_Computational_Graph_Engine.md" "Details"
click Computational_Graph_Pre_processor href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/koila/Computational_Graph_Pre_processor.md" "Details"
Koila's architecture centers around a lazy evaluation paradigm to optimize GPU memory usage for PyTorch models. It intercepts standard PyTorch operations via the Eager Execution Interceptor, converting them into a deferred computational graph managed by the Lazy Computational Graph Engine. This engine orchestrates the entire lazy execution flow, leveraging the Tensor Abstraction Layer for consistent tensor handling. Before actual computation, the Computational Graph Pre-processor analyzes and optimizes the graph's metadata, while the GPU Memory Optimizer dynamically determines efficient batch sizes, ensuring computations fit within available memory. This integrated approach allows Koila to enhance PyTorch's memory efficiency without requiring significant code changes from the user.
Eager Execution Interceptor [Expand]
Integrates Koila with PyTorch's eager execution mode by intercepting __torch_function__ calls. It wraps standard PyTorch tensors, allowing them to be seamlessly used within Koila's lazy evaluation system without requiring significant changes to existing PyTorch code.
Related Classes/Methods:
Tensor Abstraction Layer [Expand]
Provides a unified interface for accessing fundamental tensor properties (shape, data type, device, batch information) for both eager PyTorch tensors and Koila's lazy tensor representations. It defines the core protocols (Runnable, TensorMixin, RunnableTensor) and utility functions for consistent tensor interaction within Koila.
Related Classes/Methods:
Lazy Computational Graph Engine [Expand]
The central orchestrator for Koila's lazy evaluation. It intercepts PyTorch operations, builds a deferred computational graph, manages its execution, and integrates pre-computation analysis and GPU memory management. This component is responsible for the overall flow of lazy operations.
Related Classes/Methods:
Computational Graph Pre-processor [Expand]
Analyzes and transforms the computational graph's metadata (e.g., shapes, batch dimensions) before actual tensor computation. This component is crucial for optimizing memory and ensuring correct tensor operations by inferring and propagating shape and batch information through the graph.
Related Classes/Methods:
Dynamically manages GPU memory by determining optimal batch sizes and freeing up resources. It ensures that computations fit within available GPU memory, often by adjusting batch sizes based on memory constraints.
Related Classes/Methods: