feat: Add web scraping functionality and exclusionary CSS paths

This commit is contained in:
Arik Jones (aider)
2024-09-14 21:26:59 -05:00
parent bb12e3d029
commit 6cb2f03d74

View File

@@ -1,14 +1,15 @@
# Rollup
Rollup is a powerful CLI tool designed to aggregate and process files based on specified criteria. It's particularly useful for developers and system administrators who need to collect and summarize information from multiple files across a project or system.
Rollup is a powerful CLI tool designed to aggregate and process files based on specified criteria. It's particularly useful for developers and system administrators who need to collect and summarize information from multiple files across a project or system. It now includes advanced web scraping capabilities.
## Features
- File type filtering
- Ignore patterns for excluding files
- Support for code-generated file detection
- Optional web scraping functionality
- Advanced web scraping functionality
- Verbose logging option for detailed output
- Exclusionary CSS paths for web scraping
## Installation
@@ -33,6 +34,9 @@ rollup [flags]
- `--code-generated`: Comma-separated list of patterns for code-generated files
- `--verbose, -v`: Enable verbose logging
- `--config`: Path to the configuration file (default: rollup.yml)
- `--url`: URL to scrape (for web scraping functionality)
- `--css`: CSS selector for content extraction (for web scraping)
- `--exclude-css`: CSS selector for content to exclude (for web scraping)
## Configuration
@@ -52,6 +56,9 @@ code_generated:
scrape:
url: https://example.com
css_locator: .content
exclude_selectors:
- .ads
- .navigation
```
## Examples
@@ -71,6 +78,11 @@ scrape:
rollup --config=my-config.yml
```
4. Web scraping with content exclusion:
```bash
rollup --url=https://example.com --css=.main-content --exclude-css=.ads,.sidebar
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.