GitHub - RaymondTang2003/Sheet-Raptor

Project Overview

Sheet-Raptor is a unified system for Context-Critical Spreadsheet Analysis that combines the capabilities of Large Language Models (LLM) and Visual Language Models (VLM) to intelligently process, analyze, and visualize various table data.

Core Features

Structure-aware preprocessing that combines LLM-based table recognition with layout-sensitive folding, retaining only semantically meaningful table regions.
High-fidelity context retrieval that integrates multi-granular spreadsheet embeddings, question rewriting, and progressive agentic retrieval to capture relevant evidence across auxiliary spreadsheets with high precision and recall.
Agentic execution framework that decomposes spreadsheet tasks into spreadsheet-specific atomic operations and dynamically orchestrates semantic reasoning and program-oriented execution via task-aware multi-agent planning.

Examples

Part of example spreadsheet files are put in data/. More data will be released in the future.

System Architecture

Sheet-Raptor adopts an Agent-based modular architecture, mainly consisting of the following components:

SheetPreprocessor (Target spreadsheet → compact tables)

Location-Enhanced Table Recognition: uses an LLM-based table detector to identify tables with implicit/irregular boundaries (e.g., adjacent tables sharing topics but differing structures such as header depth).

Structure-Aware Folding: applies a variable-length sliding-window algorithm to detect repetitive structures and condense redundant rows/columns, preserving semantically meaningful regions while shrinking long tables.

ContextRetriever (Auxiliary spreadsheets → high-recall context) Builds a Multi-Granular Hierarchical Embedding Index covering file / sheet / table / row–column levels, trained via BERT-based contrastive learning for semantic retrieval.

Mitigates retrieval noise and semantic fragmentation via: Question Rewriting: augments the user query with inferred table topics and relevant cell ranges.

Progressive Retrieval: iteratively refines retrieval through agentic planning/refinement/search loops to improve precision and recall across multiple auxiliary spreadsheets.

SpreadsheetAnalyzer (Planning + execution for diverse tasks)

A central Routing Agent parses user intent and dispatches to one of three solver agents: QA Agent, Manipulation Agent, or Visualization Agent.

The selected solver constructs and executes a DAG of spreadsheet-specific atomic operations. Each operation is instantiated through either: a semantic-based template (executed by VLMs), or a program-oriented template (executed by code-generation LLMs).

Installation Guide

Clone the Project

git clone https://github.com/yourusername/Sheet-Raptor.git
cd Sheet-Raptor

Install Dependencies

pip install -r requirements.txt

Configure LLM

Modify your LLM configuration in src/llm_clients/default_model.yaml as needed.

Configure Sandbox

Modify your sandbox configuration in src/processing/sandbox/config.py as needed.

Usage Examples

As shown in main.py, you can run the system with the following code:

from src.graph import spreadsheet_analysis

def main():
    
    query = ""  # Your Query
    tables = []  # Your Table List

    mount_dir = ""  # Your Mount Directory
    log_dir = ""  # Your Log Directory
    enable_knowledge = False  # Whether to Enable Knowledge

    spreadsheet_analysis(query, tables, mount_dir, log_dir, enable_knowledge)

if __name__ == '__main__':
    main()

Project Structure

Sheet-Raptor/
├── src/
│   ├── agents/             # Various Agent implementations
│   │   ├── excel_agent/    # Excel processing Agent
│   │   ├── answer_agent.py     # Answer Agent
│   │   ├── code_agent.py       # Code execution Agent
│   │   ├── execution_agent.py  # Execution Agent
│   │   ├── manipulation_agent.py # Table manipulation Agent
│   │   ├── planning_agent.py   # Planning Agent
│   │   ├── plot_planning_agent.py # Chart planning Agent
│   │   ├── route_agent.py      # Routing Agent
│   │   └── vlm_answer_agent.py # VLM answer Agent
│   ├── build_knowledge/    # Knowledge base construction
│   ├── configs/            # Configuration files
│   ├── context/            # Context management
│   ├── llm_clients/        # LLM clients
│   ├── processing/         # Table preprocessing
│   ├── prompts/            # Agent prompts
│   ├── rag/                # Retrieval-augmented generation
│   ├── sandbox/            # Code execution sandbox
│   ├── table2image/        # Table to image conversion
│   └── graph.py            # Main graph structure
├── README.md               # Project documentation
└── requirements.txt        # Dependency file

Core Process

Spreadsheet Preprocessing: Load Excel files, locate tables, and convert them to images
Context Retrieval: Retrieve relevant context from auxiliary spreadsheets using multi-granular embeddings and progressive agentic retrieval.
Task Routing: RouteAgent analyzes task types and table status, selecting appropriate processing paths
Task Execution: Call corresponding Agents to process tasks based on routing results
Result Generation: Generate final answers or execution results
Log Recording: Save processing procedures and results

Baseline Results

We compare the results of our system with existing baselines on a set of benchmark tasks.

The figure below shows the comparison of our system with other baselines on our curated dataset.

License

This project adopts the MIT License.

Acknowledgments

Thanks to all developers and users who have contributed to the project!

Sheet-Raptor - Making table processing intelligent and efficient!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Core Features

Examples

System Architecture

Installation Guide

Usage Examples

Project Structure

Core Process

Baseline Results

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Core Features

Examples

System Architecture

Installation Guide

Usage Examples

Project Structure

Core Process

Baseline Results

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages