Skip to content

refactor: replace df.attrs with typed pipeline context dataclass #4

@lipikaramaswamy

Description

@lipikaramaswamy

Priority Level

Medium

Task Summary

Replace the use of df.attrs (experimental pandas feature) for threading pipeline metadata through workflow stages. Currently original_text_column is carried via df.attrs, which is not preserved through merge/concat/groupby operations.

Technical Details & Implementation Plan

Create a PipelineContext (or similar name) dataclass wrapping a DataFrame + metadata dict. Update read_input, _run_internal, LlmReplaceWorkflow, _rename_output_columns, and _build_user_dataframe to pass/return this container instead of relying on df.attrs. Remove the .attrs comment/TODO in llm_replace_workflow.py:87.

Dependencies

No response

Metadata

Metadata

Assignees

Labels

taskDevelopment task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions