docs: update configuration section in README.md

docs: Update README.md CLI flag documentation
feat: Update README.md to reflect recent changes in functionality
2025-12-13 06:23:18 +00:00 · 2024-09-22 18:36:17 -05:00 · 2024-09-22 18:33:24 -05:00 · 2024-09-22 18:31:06 -05:00 · 2024-09-22 18:20:25 -05:00 · 2024-09-22 18:18:03 -05:00
3 changed files with 64 additions and 18 deletions
--- a/README.md
+++ b/README.md
@@ -4,16 +4,18 @@ Rollup aggregates the contents of text-based files and webpages into a markdown

 ## Features

- File type filtering
- Ignore patterns for excluding files
- Support for code-generated file detection
- Advanced web scraping functionality
- Verbose logging option for detailed output
- Exclusionary CSS selectors for web scraping
- Support for multiple URLs in web scraping
+- File type filtering for targeted content aggregation
+- Ignore patterns for excluding specific files or directories
+- Support for code-generated file detection and exclusion
+- Advanced web scraping functionality with depth control
+- Verbose logging option for detailed operation insights
+- Exclusionary CSS selectors for precise web content extraction
+- Support for multiple URLs in web scraping operations
 - Configurable output format for web scraping (single file or separate files)
- Configuration file support (YAML)
- Generation of default configuration file
+- Flexible configuration file support (YAML)
+- Automatic generation of default configuration file
+- Custom output file naming
+- Concurrent processing for improved performance

 ## Installation

@@ -74,14 +76,27 @@ ignore:
 code_generated:
  - **/generated/**
 scrape:
-  urls:
-    - url: https://example.com
+  sites:
+    - base_url: https://example.com
      css_locator: .content
      exclude_selectors:
        - .ads
        - .navigation
+      max_depth: 2
+      allowed_paths:
+        - /blog
+        - /docs
+      exclude_paths:
+        - /admin
      output_alias: example
+      path_overrides:
+        - path: /special-page
+          css_locator: .special-content
+          exclude_selectors:
+            - .special-ads
  output_type: single
+  requests_per_second: 1.0
+  burst_limit: 3
 ```

 ## Examples
@@ -92,10 +107,10 @@ scrape:
   rollup files
   ```

-2. Web scraping with multiple URLs:
+2. Web scraping with multiple URLs and increased concurrency:

   ```bash
-   rollup web --urls=https://example.com,https://another-example.com
+   rollup web --urls=https://example.com,https://another-example.com --concurrent=8
   ```

 3. Generate a default configuration file:
@@ -104,15 +119,25 @@ scrape:
   rollup generate
   ```

-4. Use a custom configuration file:
+4. Use a custom configuration file and specify output:

   ```bash
-   rollup files --config=my-config.yml
+   rollup files --config=my-config.yml --output=project_summary.md
   ```

-5. Web scraping with separate output files:
+5. Web scraping with separate output files and custom timeout:
   ```bash
-   rollup web --urls=https://example.com,https://another-example.com --output=separate
+   rollup web --urls=https://example.com,https://another-example.com --output=separate --timeout=60
+   ```
+
+6. Rollup files with specific types and ignore patterns:
+   ```bash
+   rollup files --types=.go,.md --ignore=vendor/**,*_test.go
+   ```
+
+7. Web scraping with depth and CSS selector:
+   ```bash
+   rollup web --urls=https://example.com --depth=2 --css=.main-content
   ```

 ## Contributing
--- a/cmd/files_test.go
+++ b/cmd/files_test.go
@@ -67,7 +67,7 @@ func TestIsIgnored(t *testing.T) {
 		{"subdir/file.log", true},
 		{"subdir/file.txt", false},
 		{".git/config", true},
-		{"src/.git/config", true},
+		{"src/.git/config", false},
 		{"vendor/package/file.go", true},
 		{"internal/vendor/file.go", false},
 	}
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -0,0 +1,21 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [0.0.3] - 2024-09-22
+
+### Added
+- Implemented web scraping functionality using Playwright
+- Added support for CSS selectors to extract specific content
+- Introduced rate limiting for web requests
+- Created configuration options for scraping settings
+
+### Changed
+- Improved error handling and logging throughout the application
+- Enhanced URL parsing and validation
+
+### Fixed
+- Resolved issues with concurrent scraping operations
Author	SHA1	Message	Date
Arik Jones (aider)	1869dae89a	docs: update configuration section in README.md	2024-09-22 18:36:17 -05:00
Arik Jones (aider)	d3ff7cb862	docs: Update README.md CLI flag documentation	2024-09-22 18:33:24 -05:00
Arik Jones (aider)	ea410e4abb	feat: Update README.md to reflect recent changes in functionality	2024-09-22 18:31:06 -05:00
Arik Jones (aider)	7d8e25b1ad	docs: Add CHANGELOG.md with v0.0.3 release notes	2024-09-22 18:20:25 -05:00
Arik Jones	691832e282	fix: Update expectation	2024-09-22 18:18:03 -05:00