mirror of
https://github.com/tnypxl/rollup.git
synced 2025-12-13 06:23:18 +00:00
Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1869dae89a | ||
|
|
d3ff7cb862 | ||
|
|
ea410e4abb | ||
|
|
7d8e25b1ad | ||
|
|
691832e282 |
59
README.md
59
README.md
@@ -4,16 +4,18 @@ Rollup aggregates the contents of text-based files and webpages into a markdown
|
||||
|
||||
## Features
|
||||
|
||||
- File type filtering
|
||||
- Ignore patterns for excluding files
|
||||
- Support for code-generated file detection
|
||||
- Advanced web scraping functionality
|
||||
- Verbose logging option for detailed output
|
||||
- Exclusionary CSS selectors for web scraping
|
||||
- Support for multiple URLs in web scraping
|
||||
- File type filtering for targeted content aggregation
|
||||
- Ignore patterns for excluding specific files or directories
|
||||
- Support for code-generated file detection and exclusion
|
||||
- Advanced web scraping functionality with depth control
|
||||
- Verbose logging option for detailed operation insights
|
||||
- Exclusionary CSS selectors for precise web content extraction
|
||||
- Support for multiple URLs in web scraping operations
|
||||
- Configurable output format for web scraping (single file or separate files)
|
||||
- Configuration file support (YAML)
|
||||
- Generation of default configuration file
|
||||
- Flexible configuration file support (YAML)
|
||||
- Automatic generation of default configuration file
|
||||
- Custom output file naming
|
||||
- Concurrent processing for improved performance
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -74,14 +76,27 @@ ignore:
|
||||
code_generated:
|
||||
- **/generated/**
|
||||
scrape:
|
||||
urls:
|
||||
- url: https://example.com
|
||||
sites:
|
||||
- base_url: https://example.com
|
||||
css_locator: .content
|
||||
exclude_selectors:
|
||||
- .ads
|
||||
- .navigation
|
||||
max_depth: 2
|
||||
allowed_paths:
|
||||
- /blog
|
||||
- /docs
|
||||
exclude_paths:
|
||||
- /admin
|
||||
output_alias: example
|
||||
path_overrides:
|
||||
- path: /special-page
|
||||
css_locator: .special-content
|
||||
exclude_selectors:
|
||||
- .special-ads
|
||||
output_type: single
|
||||
requests_per_second: 1.0
|
||||
burst_limit: 3
|
||||
```
|
||||
|
||||
## Examples
|
||||
@@ -92,10 +107,10 @@ scrape:
|
||||
rollup files
|
||||
```
|
||||
|
||||
2. Web scraping with multiple URLs:
|
||||
2. Web scraping with multiple URLs and increased concurrency:
|
||||
|
||||
```bash
|
||||
rollup web --urls=https://example.com,https://another-example.com
|
||||
rollup web --urls=https://example.com,https://another-example.com --concurrent=8
|
||||
```
|
||||
|
||||
3. Generate a default configuration file:
|
||||
@@ -104,15 +119,25 @@ scrape:
|
||||
rollup generate
|
||||
```
|
||||
|
||||
4. Use a custom configuration file:
|
||||
4. Use a custom configuration file and specify output:
|
||||
|
||||
```bash
|
||||
rollup files --config=my-config.yml
|
||||
rollup files --config=my-config.yml --output=project_summary.md
|
||||
```
|
||||
|
||||
5. Web scraping with separate output files:
|
||||
5. Web scraping with separate output files and custom timeout:
|
||||
```bash
|
||||
rollup web --urls=https://example.com,https://another-example.com --output=separate
|
||||
rollup web --urls=https://example.com,https://another-example.com --output=separate --timeout=60
|
||||
```
|
||||
|
||||
6. Rollup files with specific types and ignore patterns:
|
||||
```bash
|
||||
rollup files --types=.go,.md --ignore=vendor/**,*_test.go
|
||||
```
|
||||
|
||||
7. Web scraping with depth and CSS selector:
|
||||
```bash
|
||||
rollup web --urls=https://example.com --depth=2 --css=.main-content
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
@@ -67,7 +67,7 @@ func TestIsIgnored(t *testing.T) {
|
||||
{"subdir/file.log", true},
|
||||
{"subdir/file.txt", false},
|
||||
{".git/config", true},
|
||||
{"src/.git/config", true},
|
||||
{"src/.git/config", false},
|
||||
{"vendor/package/file.go", true},
|
||||
{"internal/vendor/file.go", false},
|
||||
}
|
||||
|
||||
21
docs/CHANGELOG.md
Normal file
21
docs/CHANGELOG.md
Normal file
@@ -0,0 +1,21 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [0.0.3] - 2024-09-22
|
||||
|
||||
### Added
|
||||
- Implemented web scraping functionality using Playwright
|
||||
- Added support for CSS selectors to extract specific content
|
||||
- Introduced rate limiting for web requests
|
||||
- Created configuration options for scraping settings
|
||||
|
||||
### Changed
|
||||
- Improved error handling and logging throughout the application
|
||||
- Enhanced URL parsing and validation
|
||||
|
||||
### Fixed
|
||||
- Resolved issues with concurrent scraping operations
|
||||
Reference in New Issue
Block a user