rollup

mirror of https://github.com/tnypxl/rollup.git synced 2025-12-15 15:03:17 +00:00

Author	SHA1	Message	Date
Claude	ff13012408	fix: address functionality gaps identified in code review - Wire up --config/-f flag to actually load custom config files - Move config loading to PersistentPreRunE in root.go - Simplify main.go to just call cmd.Execute() - Move Playwright init to web command's PreRunE/PostRunE - Remove unused functions from cmd/web.go (~90 lines of dead code) - Remove writeSingleFile, writeMultipleFiles, generateDefaultFilename - Remove scrapeURL, extractAndConvertContent, testExtractAndConvertContent - Remove unused mock function from web_test.go - Add OutputType validation to Config.Validate() - Only allow "single", "separate", or empty string - Add test cases for valid and invalid output types	2025-11-27 16:05:42 +00:00
Claude	09608cf073	fix: resolve 5 bugs identified in code review - Fix malformed YAML in config_test.go (incorrect indentation) - Add validation for empty file_extensions in Config.Validate() - Remove obsolete max_depth test case (field no longer exists) - Remove unused global cfg variable in main.go - Fix race condition in ScrapeSites by counting URLs before goroutines - Remove unreachable JavaScript code in scroll script, add proper delay - Standardize file extensions to not include leading dot	2025-11-27 15:56:37 +00:00
Arik Jones	9341a51d09	fix multi-file output	2024-12-06 17:02:31 -06:00
Arik Jones	645626f763	remove maxdepth from tests	2024-12-06 15:17:33 -06:00
tnypxl	02e39baf38	flatten scrape config to 'sites:' * flatten scrape config to 'sites:'. Update unit tests and readme. * remove check for file_extensions configuration. * show progress indication after 5 seconds. * add documentation to functions * fix: remove MaxDepth and link extraction functionality * fix: Remove MaxDepth references from cmd/web.go	2024-10-14 16:09:58 -05:00
Arik Jones	333b9a366c	fix: Resolve playwright function deprecations and io/ioutil function deprecations.	2024-09-24 15:13:36 -05:00
Arik Jones (aider)	d5a94f5468	fix: remove indentation while preserving HTML structure in ExtractContentWithCSS	2024-09-22 17:00:16 -05:00
Arik Jones (aider)	59994c085c	fix: improve file ignore logic and preserve newlines in extracted content	2024-09-22 16:58:53 -05:00
Arik Jones (aider)	364b185269	fix: resolve test failures in TestRunRollup, TestExtractContentWithCSS, and TestExtractLinks	2024-09-21 16:04:20 -05:00
Arik Jones (aider)	952c2dda02	refactor: update browser initialization in scraper tests	2024-09-21 16:01:51 -05:00
Arik Jones (aider)	de84d68b4c	test: initialize browser before running ExtractLinks test	2024-09-21 16:01:08 -05:00
Arik Jones (aider)	e5d4c514a7	fix: resolve build errors in test files	2024-09-21 15:59:39 -05:00
Arik Jones (aider)	6ff44f81bb	fix: resolve nil pointer dereference in ExtractContentWithCSS test	2024-09-21 15:59:08 -05:00
Arik Jones (aider)	2fd411ce65	test: add debugging info and fix reflect import	2024-09-21 15:57:05 -05:00
Arik Jones	73116e8d82	Fix logging and other issues from preventing scraping	2024-09-21 15:54:33 -05:00
Arik Jones	160a15dbb1	fix: Use logger instead of log. Move web subcommand initialization to root.go	2024-09-19 11:44:27 -05:00
Arik Jones (aider)	7f468a05bd	feat: install only Chromium browser	2024-09-17 14:51:09 -05:00
Arik Jones (aider)	4586b5daaa	fix: Install Playwright and browsers before initializing	2024-09-17 14:48:15 -05:00
Arik Jones (aider)	53dcd6eb71	feat: Add support for exclusionary CSS paths in config.go	2024-09-14 20:59:08 -05:00
Arik Jones (aider)	c1755836b5	fix: Move HTML to Markdown conversion to scraper.go	2024-09-14 20:55:35 -05:00
Arik Jones (aider)	6f4750c900	fix: Remove references to non-existent CSSLocator field in Config struct	2024-09-14 20:36:31 -05:00
Arik Jones (aider)	52c7de255d	feat: Implement scraping of multiple URLs with optional CSS locators and separate output files	2024-09-14 20:35:35 -05:00
Arik Jones (aider)	23508df6f4	feat: Add optional logging to the scraper	2024-09-14 19:59:02 -05:00
Arik Jones	01d6b2f54f	fix: Improve page content extraction in scraper	2024-09-14 19:59:01 -05:00
Arik Jones (aider)	3378402fb9	fix: Handle missing content in ProcessHTMLContent	2024-09-14 19:43:58 -05:00
Arik Jones	2ab0d74279	fix: Update scraper to handle empty URLs	2024-09-14 19:42:38 -05:00
Arik Jones (aider)	eaa7135eab	feat: Improve content extraction with fallback to body	2024-09-14 17:05:05 -05:00
Arik Jones (aider)	7cdd68d020	feat: Separate include and exclude selectors in web scraper	2024-09-14 16:59:59 -05:00
Arik Jones (aider)	39e06ee9d5	fix: remove space between minus and CSS path in parseSelectors	2024-09-14 16:54:34 -05:00
Arik Jones (aider)	d66fd04016	fix: Use `-` instead of `!` to filter unwanted elements	2024-09-14 16:53:42 -05:00
Arik Jones (aider)	56d5a8a194	refactor: Remove XPath support	2024-09-14 16:51:18 -05:00
Arik Jones (aider)	09f8ed07c2	fix: Remove unused variable `excludeXPaths` in `ExtractContentWithXPath` function	2024-09-14 16:50:34 -05:00
Arik Jones (aider)	f1af20e95e	feat: Add support for excluding child elements in content extraction	2024-09-14 16:49:32 -05:00
Arik Jones (aider)	d0ee666b07	refactor: Modify scraper to capture only the main content	2024-09-14 15:20:15 -05:00
Arik Jones (aider)	1a57be80fa	fix: Remove print media emulation and improve CSS selector extraction	2024-09-14 15:14:53 -05:00
Arik Jones (aider)	ea12ad631c	fix: Fix assignment mismatch in ExtractContentWithCSS function	2024-09-14 14:54:04 -05:00
Arik Jones (aider)	885f3fc2b8	feat: Add missing scraper functions	2024-09-14 14:52:45 -05:00
Arik Jones	0163c4e504	Adds a configuration layer for use rollup.yml which may be preferred over CLI flags.	2024-09-05 23:41:39 -05:00

38 Commits