mirror of
https://github.com/tnypxl/rollup.git
synced 2025-12-15 15:03:17 +00:00
Compare commits
4 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1869dae89a | ||
|
|
d3ff7cb862 | ||
|
|
ea410e4abb | ||
|
|
7d8e25b1ad |
59
README.md
59
README.md
@@ -4,16 +4,18 @@ Rollup aggregates the contents of text-based files and webpages into a markdown
|
|||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- File type filtering
|
- File type filtering for targeted content aggregation
|
||||||
- Ignore patterns for excluding files
|
- Ignore patterns for excluding specific files or directories
|
||||||
- Support for code-generated file detection
|
- Support for code-generated file detection and exclusion
|
||||||
- Advanced web scraping functionality
|
- Advanced web scraping functionality with depth control
|
||||||
- Verbose logging option for detailed output
|
- Verbose logging option for detailed operation insights
|
||||||
- Exclusionary CSS selectors for web scraping
|
- Exclusionary CSS selectors for precise web content extraction
|
||||||
- Support for multiple URLs in web scraping
|
- Support for multiple URLs in web scraping operations
|
||||||
- Configurable output format for web scraping (single file or separate files)
|
- Configurable output format for web scraping (single file or separate files)
|
||||||
- Configuration file support (YAML)
|
- Flexible configuration file support (YAML)
|
||||||
- Generation of default configuration file
|
- Automatic generation of default configuration file
|
||||||
|
- Custom output file naming
|
||||||
|
- Concurrent processing for improved performance
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -74,14 +76,27 @@ ignore:
|
|||||||
code_generated:
|
code_generated:
|
||||||
- **/generated/**
|
- **/generated/**
|
||||||
scrape:
|
scrape:
|
||||||
urls:
|
sites:
|
||||||
- url: https://example.com
|
- base_url: https://example.com
|
||||||
css_locator: .content
|
css_locator: .content
|
||||||
exclude_selectors:
|
exclude_selectors:
|
||||||
- .ads
|
- .ads
|
||||||
- .navigation
|
- .navigation
|
||||||
|
max_depth: 2
|
||||||
|
allowed_paths:
|
||||||
|
- /blog
|
||||||
|
- /docs
|
||||||
|
exclude_paths:
|
||||||
|
- /admin
|
||||||
output_alias: example
|
output_alias: example
|
||||||
|
path_overrides:
|
||||||
|
- path: /special-page
|
||||||
|
css_locator: .special-content
|
||||||
|
exclude_selectors:
|
||||||
|
- .special-ads
|
||||||
output_type: single
|
output_type: single
|
||||||
|
requests_per_second: 1.0
|
||||||
|
burst_limit: 3
|
||||||
```
|
```
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
@@ -92,10 +107,10 @@ scrape:
|
|||||||
rollup files
|
rollup files
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Web scraping with multiple URLs:
|
2. Web scraping with multiple URLs and increased concurrency:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
rollup web --urls=https://example.com,https://another-example.com
|
rollup web --urls=https://example.com,https://another-example.com --concurrent=8
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Generate a default configuration file:
|
3. Generate a default configuration file:
|
||||||
@@ -104,15 +119,25 @@ scrape:
|
|||||||
rollup generate
|
rollup generate
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Use a custom configuration file:
|
4. Use a custom configuration file and specify output:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
rollup files --config=my-config.yml
|
rollup files --config=my-config.yml --output=project_summary.md
|
||||||
```
|
```
|
||||||
|
|
||||||
5. Web scraping with separate output files:
|
5. Web scraping with separate output files and custom timeout:
|
||||||
```bash
|
```bash
|
||||||
rollup web --urls=https://example.com,https://another-example.com --output=separate
|
rollup web --urls=https://example.com,https://another-example.com --output=separate --timeout=60
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Rollup files with specific types and ignore patterns:
|
||||||
|
```bash
|
||||||
|
rollup files --types=.go,.md --ignore=vendor/**,*_test.go
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Web scraping with depth and CSS selector:
|
||||||
|
```bash
|
||||||
|
rollup web --urls=https://example.com --depth=2 --css=.main-content
|
||||||
```
|
```
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|||||||
21
docs/CHANGELOG.md
Normal file
21
docs/CHANGELOG.md
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to this project will be documented in this file.
|
||||||
|
|
||||||
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [0.0.3] - 2024-09-22
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Implemented web scraping functionality using Playwright
|
||||||
|
- Added support for CSS selectors to extract specific content
|
||||||
|
- Introduced rate limiting for web requests
|
||||||
|
- Created configuration options for scraping settings
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Improved error handling and logging throughout the application
|
||||||
|
- Enhanced URL parsing and validation
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Resolved issues with concurrent scraping operations
|
||||||
Reference in New Issue
Block a user