From 6cb2f03d74f3922e5f349852662181981caa4b8f Mon Sep 17 00:00:00 2001 From: "Arik Jones (aider)" Date: Sat, 14 Sep 2024 21:26:59 -0500 Subject: [PATCH] feat: Add web scraping functionality and exclusionary CSS paths --- README.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 78196c4..59c8a17 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,15 @@ # Rollup -Rollup is a powerful CLI tool designed to aggregate and process files based on specified criteria. It's particularly useful for developers and system administrators who need to collect and summarize information from multiple files across a project or system. +Rollup is a powerful CLI tool designed to aggregate and process files based on specified criteria. It's particularly useful for developers and system administrators who need to collect and summarize information from multiple files across a project or system. It now includes advanced web scraping capabilities. ## Features - File type filtering - Ignore patterns for excluding files - Support for code-generated file detection -- Optional web scraping functionality +- Advanced web scraping functionality - Verbose logging option for detailed output +- Exclusionary CSS paths for web scraping ## Installation @@ -33,6 +34,9 @@ rollup [flags] - `--code-generated`: Comma-separated list of patterns for code-generated files - `--verbose, -v`: Enable verbose logging - `--config`: Path to the configuration file (default: rollup.yml) +- `--url`: URL to scrape (for web scraping functionality) +- `--css`: CSS selector for content extraction (for web scraping) +- `--exclude-css`: CSS selector for content to exclude (for web scraping) ## Configuration @@ -52,6 +56,9 @@ code_generated: scrape: url: https://example.com css_locator: .content + exclude_selectors: + - .ads + - .navigation ``` ## Examples @@ -71,6 +78,11 @@ scrape: rollup --config=my-config.yml ``` +4. Web scraping with content exclusion: + ```bash + rollup --url=https://example.com --css=.main-content --exclude-css=.ads,.sidebar + ``` + ## Contributing Contributions are welcome! Please feel free to submit a Pull Request.