ScrapeGraph Python SDK for API
Project description
🌐 ScrapeGraph Python SDK
Official Python SDK for the ScrapeGraph API - Smart web scraping powered by AI.
📦 Installation
pip install scrapegraph-py
🚀 Features
- 🤖 AI-powered web scraping and search
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
- ⚡ Automatic retries
- 🔐 Secure authentication
🎯 Quick Start
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
[!NOTE] You can set the
SGAI_API_KEY
environment variable and initialize the client without parameters:client = Client()
📚 Available Endpoints
🤖 SmartScraper
Extract structured data from any webpage or HTML content using AI.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
# Using a URL
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)
# Or using HTML content
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
</body>
</html>
"""
response = client.smartscraper(
website_html=html_content,
user_prompt="Extract the company description"
)
print(response)
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
class WebsiteData(BaseModel):
title: str = Field(description="The page title")
description: str = Field(description="The meta description")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the title and description",
output_schema=WebsiteData
)
🔍 SearchScraper
Perform AI-powered web searches with structured results and reference URLs.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?"
)
print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
Output Schema (Optional)
from pydantic import BaseModel, Field
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
class PythonVersionInfo(BaseModel):
version: str = Field(description="The latest Python version number")
release_date: str = Field(description="When this version was released")
major_features: list[str] = Field(description="List of main features")
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?",
output_schema=PythonVersionInfo
)
📝 Markdownify
Converts any webpage into clean, formatted markdown.
from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
response = client.markdownify(
website_url="https://example.com"
)
print(response)
⚡ Async Support
All endpoints support async operations:
import asyncio
from scrapegraph_py import AsyncClient
async def main():
async with AsyncClient() as client:
response = await client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main content"
)
print(response)
asyncio.run(main())
📖 Documentation
For detailed documentation, visit docs.scrapegraphai.com
🛠️ Development
For information about setting up the development environment and contributing to the project, see our Contributing Guide.
💬 Support & Feedback
- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: Create an issue
- 🌟 Feature Requests: Request a feature
- ⭐ API Feedback: You can also submit feedback programmatically using the feedback endpoint:
from scrapegraph_py import Client client = Client(api_key="your-api-key-here") client.submit_feedback( request_id="your-request-id", rating=5, feedback_text="Great results!" )
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
Made with ❤️ by ScrapeGraph AI