Skip to content

docs: migrate chat-webpage-simple-rag cookbook to v2 API (5/N)#95

Closed
Vikrant-Khedkar wants to merge 1 commit into
mainfrom
docs/cookbook-chat-rag-v2
Closed

docs: migrate chat-webpage-simple-rag cookbook to v2 API (5/N)#95
Vikrant-Khedkar wants to merge 1 commit into
mainfrom
docs/cookbook-chat-rag-v2

Conversation

@Vikrant-Khedkar
Copy link
Copy Markdown
Collaborator

@Vikrant-Khedkar Vikrant-Khedkar commented May 11, 2026

Summary

Fifth of N PRs (follows #91, #92, #93, #94) restoring + migrating the cookbook notebooks that were removed in 1f3b123 to the v2 SDK API. This PR migrates cookbook/chat-webpage-simple-rag/scrapegraph_burr_lancedb.ipynb — the heavy one (RAG pipeline with Burr + LanceDB + OpenAI + OpenTelemetry).

Migration

Old New
from scrapegraph_py import Client from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
from scrapegraph_py.logger import sgai_logger removed — module no longer exists in v2
sgai_logger.set_logging(level="INFO") logging.basicConfig(level=logging.INFO) (stdlib)
Client() ScrapeGraphAI()
sgai_client.markdownify(website_url=url) sgai_client.scrape(url, formats=[MarkdownFormatConfig()])
response["result"] (str) "\n\n".join(response.data.results["markdown"]["data"]) (list[str] → joined str) + .status check
https://dashboard.scrapegraphai.com/ https://scrapegraphai.com/dashboard
Old banner (143 KB inline base64) New ScrapeGraphAI banner (77 KB inline base64)

Notable v2 response shape change

scrape() returns response.data.results["markdown"]["data"] as list[str] (one element per page), not a single string like the old markdownify(). Cell 15's fetch_webpage action now joins them with \n\n before passing into the chunking pipeline.

Validation

  • Tested headlessly via jupyter nbconvert --execute — migrated cells (11, 14, 15) all run clean. Verified scrape() returns proper markdown; chunking + embedding (cell 22) consumes the joined string correctly. Full app.run() pipeline (cell 29) takes >10 min due to OpenAI embedding all chunks of scrapegraphai.com — that's pre-existing notebook behavior, not a migration regression.

Follow-ups (separate PRs)

This completes all 5 direct-SDK notebooks. Remaining cookbook work:

  • LangChain integration notebooks (4) — via langchain-scrapegraph wrapper, need separate verification
  • LlamaIndex integration notebooks (5) — via llama-index-tools-scrapegraph wrapper, need separate verification
  • LangGraph integration notebooks (3)
  • CrewAI integration notebook (1)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

- Client -> ScrapeGraphAI (auto-reads SGAI_API_KEY)
- markdownify(website_url=) -> scrape(url, formats=[MarkdownFormatConfig()])
- Drop scrapegraph_py.logger.sgai_logger (no longer exists) -> stdlib logging
- Response shape: response['result'] (str) -> "\\n\\n".join(response.data.results["markdown"]["data"]) (list[str])
- Update dashboard URL: dashboard.scrapegraphai.com -> scrapegraphai.com/dashboard
- Swap outdated banner image

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Vikrant-Khedkar Vikrant-Khedkar force-pushed the docs/cookbook-chat-rag-v2 branch from 81ec553 to 8b00a13 Compare May 12, 2026 07:11
@Vikrant-Khedkar Vikrant-Khedkar marked this pull request as ready for review May 12, 2026 07:12
Copy link
Copy Markdown
Member

@VinciGit00 VinciGit00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

burr is not required anymore

@VinciGit00 VinciGit00 closed this May 12, 2026
@VinciGit00 VinciGit00 deleted the docs/cookbook-chat-rag-v2 branch May 12, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants