Claude
ff13012408
fix: address functionality gaps identified in code review
...
- Wire up --config/-f flag to actually load custom config files
- Move config loading to PersistentPreRunE in root.go
- Simplify main.go to just call cmd.Execute()
- Move Playwright init to web command's PreRunE/PostRunE
- Remove unused functions from cmd/web.go (~90 lines of dead code)
- Remove writeSingleFile, writeMultipleFiles, generateDefaultFilename
- Remove scrapeURL, extractAndConvertContent, testExtractAndConvertContent
- Remove unused mock function from web_test.go
- Add OutputType validation to Config.Validate()
- Only allow "single", "separate", or empty string
- Add test cases for valid and invalid output types
2025-11-27 16:05:42 +00:00
Claude
09608cf073
fix: resolve 5 bugs identified in code review
...
- Fix malformed YAML in config_test.go (incorrect indentation)
- Add validation for empty file_extensions in Config.Validate()
- Remove obsolete max_depth test case (field no longer exists)
- Remove unused global cfg variable in main.go
- Fix race condition in ScrapeSites by counting URLs before goroutines
- Remove unreachable JavaScript code in scroll script, add proper delay
- Standardize file extensions to not include leading dot
2025-11-27 15:56:37 +00:00
Arik Jones
9341a51d09
fix multi-file output
2024-12-06 17:02:31 -06:00
Arik Jones
645626f763
remove maxdepth from tests
2024-12-06 15:17:33 -06:00
tnypxl
02e39baf38
flatten scrape config to 'sites:'
...
* flatten scrape config to 'sites:'. Update unit tests and readme.
* remove check for file_extensions configuration.
* show progress indication after 5 seconds.
* add documentation to functions
* fix: remove MaxDepth and link extraction functionality
* fix: Remove MaxDepth references from cmd/web.go
2024-10-14 16:09:58 -05:00
333b9a366c
fix: Resolve playwright function deprecations and io/ioutil function deprecations.
2024-09-24 15:13:36 -05:00
Arik Jones (aider)
d5a94f5468
fix: remove indentation while preserving HTML structure in ExtractContentWithCSS
2024-09-22 17:00:16 -05:00
Arik Jones (aider)
59994c085c
fix: improve file ignore logic and preserve newlines in extracted content
2024-09-22 16:58:53 -05:00
Arik Jones (aider)
364b185269
fix: resolve test failures in TestRunRollup, TestExtractContentWithCSS, and TestExtractLinks
2024-09-21 16:04:20 -05:00
Arik Jones (aider)
952c2dda02
refactor: update browser initialization in scraper tests
2024-09-21 16:01:51 -05:00
Arik Jones (aider)
de84d68b4c
test: initialize browser before running ExtractLinks test
2024-09-21 16:01:08 -05:00
Arik Jones (aider)
e5d4c514a7
fix: resolve build errors in test files
2024-09-21 15:59:39 -05:00
Arik Jones (aider)
6ff44f81bb
fix: resolve nil pointer dereference in ExtractContentWithCSS test
2024-09-21 15:59:08 -05:00
Arik Jones (aider)
2fd411ce65
test: add debugging info and fix reflect import
2024-09-21 15:57:05 -05:00
Arik Jones
73116e8d82
Fix logging and other issues from preventing scraping
2024-09-21 15:54:33 -05:00
Arik Jones
160a15dbb1
fix: Use logger instead of log. Move web subcommand initialization to root.go
2024-09-19 11:44:27 -05:00
Arik Jones (aider)
7f468a05bd
feat: install only Chromium browser
2024-09-17 14:51:09 -05:00
Arik Jones (aider)
4586b5daaa
fix: Install Playwright and browsers before initializing
2024-09-17 14:48:15 -05:00
Arik Jones (aider)
53dcd6eb71
feat: Add support for exclusionary CSS paths in config.go
2024-09-14 20:59:08 -05:00
Arik Jones (aider)
c1755836b5
fix: Move HTML to Markdown conversion to scraper.go
2024-09-14 20:55:35 -05:00
Arik Jones (aider)
6f4750c900
fix: Remove references to non-existent CSSLocator field in Config struct
2024-09-14 20:36:31 -05:00
Arik Jones (aider)
52c7de255d
feat: Implement scraping of multiple URLs with optional CSS locators and separate output files
2024-09-14 20:35:35 -05:00
Arik Jones (aider)
23508df6f4
feat: Add optional logging to the scraper
2024-09-14 19:59:02 -05:00
Arik Jones
01d6b2f54f
fix: Improve page content extraction in scraper
2024-09-14 19:59:01 -05:00
Arik Jones (aider)
3378402fb9
fix: Handle missing content in ProcessHTMLContent
2024-09-14 19:43:58 -05:00
Arik Jones
2ab0d74279
fix: Update scraper to handle empty URLs
2024-09-14 19:42:38 -05:00
Arik Jones (aider)
eaa7135eab
feat: Improve content extraction with fallback to body
2024-09-14 17:05:05 -05:00
Arik Jones (aider)
7cdd68d020
feat: Separate include and exclude selectors in web scraper
2024-09-14 16:59:59 -05:00
Arik Jones (aider)
39e06ee9d5
fix: remove space between minus and CSS path in parseSelectors
2024-09-14 16:54:34 -05:00
Arik Jones (aider)
d66fd04016
fix: Use - instead of ! to filter unwanted elements
2024-09-14 16:53:42 -05:00
Arik Jones (aider)
56d5a8a194
refactor: Remove XPath support
2024-09-14 16:51:18 -05:00
Arik Jones (aider)
09f8ed07c2
fix: Remove unused variable excludeXPaths in ExtractContentWithXPath function
2024-09-14 16:50:34 -05:00
Arik Jones (aider)
f1af20e95e
feat: Add support for excluding child elements in content extraction
2024-09-14 16:49:32 -05:00
Arik Jones (aider)
d0ee666b07
refactor: Modify scraper to capture only the main content
2024-09-14 15:20:15 -05:00
Arik Jones (aider)
1a57be80fa
fix: Remove print media emulation and improve CSS selector extraction
2024-09-14 15:14:53 -05:00
Arik Jones (aider)
ea12ad631c
fix: Fix assignment mismatch in ExtractContentWithCSS function
2024-09-14 14:54:04 -05:00
Arik Jones (aider)
885f3fc2b8
feat: Add missing scraper functions
2024-09-14 14:52:45 -05:00
Arik Jones
0163c4e504
Adds a configuration layer for use rollup.yml which may be preferred over CLI flags.
2024-09-05 23:41:39 -05:00