Skip to main content

Your friendly neighborhood web scraper

Project description

https://badge.fury.io/py/pyrobot.png https://travis-ci.org/jmcarp/pyrobot.png?branch=master https://coveralls.io/repos/jmcarp/pyrobot/badge.png?branch=master

Homepage: http://pyrobot.readthedocs.org/

import re
from pyrobot import RoboBrowser

# Browse to Rap Genius
browser = RoboBrowser(history=True)
browser.open('http://rapgenius.com/')

# Search for Queen
form = browser.get_form(action=re.compile(r'search'))
form['q'].value = 'queen'
browser.submit_form(form)

# Look up the first song
songs = browser.select('.song_name')
browser.follow_link(songs[0])
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text     # \n[Intro]\nIs this the real life...

# Back to results page
browser.back()

# Look up my favorite song
browser.follow_link('death on two legs')
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text     # \n[Verse 1]\nYou suck my blood like a leech...

PyRobot combines the best of two excellent Python libraries: Requests and BeautifulSoup. PyRobot represents browser sessions using Requests and HTML responses using BeautifulSoup, transparently exposing methods of both libraries:

import re
from pyrobot import RoboBrowser

browser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')

# Inspect the browser session
browser.session.cookies['_gh_sess']         # BAh7Bzo...
browser.session.headers['User-Agent']       # a python robot

# Searched the parsed HTML
browser.select('div.teaser-icon')       # [<div class="teaser-icon">
                                        # <span class="mega-octicon octicon-checklist"></span>
                                        # </div>,
                                        # ...
browser.find(class_=re.compile(r'column', re.I))    # <div class="one-third column">
                                                    # <div class="teaser-icon">
                                                    # <span class="mega-octicon octicon-checklist"></span>
                                                    # ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrobot-0.1.0.tar.gz (10.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page