Go to file
Claude ff13012408 fix: address functionality gaps identified in code review
- Wire up --config/-f flag to actually load custom config files
  - Move config loading to PersistentPreRunE in root.go
  - Simplify main.go to just call cmd.Execute()
  - Move Playwright init to web command's PreRunE/PostRunE

- Remove unused functions from cmd/web.go (~90 lines of dead code)
  - Remove writeSingleFile, writeMultipleFiles, generateDefaultFilename
  - Remove scrapeURL, extractAndConvertContent, testExtractAndConvertContent
  - Remove unused mock function from web_test.go

- Add OutputType validation to Config.Validate()
  - Only allow "single", "separate", or empty string
  - Add test cases for valid and invalid output types
2025-11-27 16:05:42 +00:00
2024-09-14 20:15:05 -05:00
2024-10-14 16:09:58 -05:00

Rollup

Rollup aggregates the contents of text-based files and webpages into a markdown file.

Features

  • File type filtering for targeted content aggregation
  • Ignore patterns for excluding specific files or directories
  • Support for code-generated file detection and exclusion
  • Advanced web scraping functionality with depth control
  • Verbose logging option for detailed operation insights
  • Exclusionary CSS selectors for precise web content extraction
  • Support for multiple URLs in web scraping operations
  • Configurable output format for web scraping (single file or separate files)
  • Flexible configuration file support (YAML)
  • Automatic generation of default configuration file
  • Custom output file naming
  • Rate limiting for web scraping to respect server resources

Installation

To install Rollup, make sure you have Go installed on your system, then run:

go get github.com/tnypxl/rollup

Usage

Basic usage:

rollup [command] [flags]

Commands

  • rollup files: Rollup files into a single Markdown file
  • rollup web: Scrape main content from webpages and convert to Markdown
  • rollup generate: Generate a rollup.yml config file

Flags for files command

  • --path, -p: Path to the project directory (default: current directory)
  • --types, -t: Comma-separated list of file extensions to include (default: .go,.md,.txt)
  • --codegen, -g: Comma-separated list of glob patterns for code-generated files
  • --ignore, -i: Comma-separated list of glob patterns for files to ignore

Flags for web command

  • --urls, -u: URLs of the webpages to scrape (comma-separated)
  • --output, -o: Output type: 'single' for one file, 'separate' for multiple files (default: single)
  • --depth, -d: Depth of link traversal (default: 0, only scrape the given URLs)
  • --css: CSS selector to extract specific content
  • --exclude: CSS selectors to exclude from the extracted content (comma-separated)

Global flags

  • --config, -f: Path to the configuration file (default: rollup.yml in the current directory)
  • --verbose, -v: Enable verbose logging

Configuration

Rollup can be configured using a YAML file. By default, it looks for rollup.yml in the current directory. You can specify a different configuration file using the --config flag.

Example rollup.yml:

file_extensions:
  - go
  - md
ignore_paths:
  - node_modules/**
  - vendor/**
  - .git/**
code_generated_paths:
  - **/generated/**
sites:
  - base_url: https://example.com
    css_locator: .content
    exclude_selectors:
      - .ads
      - .navigation
    max_depth: 2
    allowed_paths:
      - /blog
      - /docs
    exclude_paths:
      - /admin
    output_alias: example
    path_overrides:
      - path: /special-page
        css_locator: .special-content
        exclude_selectors:
          - .special-ads
output_type: single
requests_per_second: 1.0
burst_limit: 3

Examples

  1. Rollup files with default configuration:

    rollup files
    
  2. Web scraping with multiple URLs:

    rollup web --urls=https://example.com,https://another-example.com
    
  3. Generate a default configuration file:

    rollup generate
    
  4. Use a custom configuration file:

    rollup files --config=my-config.yml
    
  5. Web scraping with separate output files:

    rollup web --urls=https://example.com,https://another-example.com --output=separate
    
  6. Rollup files with specific types and ignore patterns:

    rollup files --types=go,md --ignore=vendor/**,*_test.go
    
  7. Web scraping with depth and CSS selector:

    rollup web --urls=https://example.com --depth=2 --css=.main-content
    

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Description
Compiles text-based files into a single markdown document
Readme MIT 229 KiB
Languages
Go 100%