Commit Graph

96 Commits

Author SHA1 Message Date
Arik Jones (aider)
c77ae918c5 refactor: remove redundant variable declarations in test file 2024-09-19 16:25:30 -05:00
Arik Jones (aider)
1b696ce9c6 refactor: use wrapper functions for easier testing 2024-09-19 16:25:02 -05:00
Arik Jones (aider)
df1178cb03 test: refactor TestScrapeURL to use local mock functions 2024-09-19 16:24:23 -05:00
Arik Jones (aider)
c4831dfea2 fix: resolve compilation errors in web_test.go 2024-09-19 16:23:40 -05:00
Arik Jones (aider)
3c22d8034d fix: correct import path and update Config struct usage in test 2024-09-19 16:23:01 -05:00
Arik Jones (aider)
c7791814c9 fix: add missing imports and correct Config reference in files_test.go 2024-09-19 16:21:11 -05:00
Arik Jones (aider)
e184cef444 test: add unit tests for cmd and internal packages 2024-09-19 16:15:32 -05:00
Arik Jones (aider)
702665bb2e fix: import config package to resolve undefined error 2024-09-19 16:12:30 -05:00
Arik Jones (aider)
1d02cab585 fix: resolve type mismatch for PathOverrides in SiteConfig 2024-09-19 16:11:14 -05:00
Arik Jones (aider)
e3fddf101c fix: resolve undefined types and import issues in scraper.go 2024-09-19 16:10:06 -05:00
Arik Jones (aider)
569ff9924d feat: implement site-based scraping with path overrides 2024-09-19 16:06:55 -05:00
Arik Jones (aider)
1d38e4157c fix: add Scrape field to Config struct and create ScrapeConfig 2024-09-19 15:23:35 -05:00
Arik Jones (aider)
d44fabf783 feat: implement rate limiting for URL scraping 2024-09-19 15:22:02 -05:00
Arik Jones
f9eee282bc docs: Update readme to include generate command that produces default config file. 2024-09-19 12:22:42 -05:00
Arik Jones (aider)
fca1422104 refactor: improve generate command and use config package 2024-09-19 12:08:34 -05:00
Arik Jones (aider)
2e563836f3 feat: add generate subcommand for creating rollup.yml config 2024-09-19 12:08:09 -05:00
Arik Jones
160a15dbb1 fix: Use logger instead of log. Move web subcommand initialization to root.go 2024-09-19 11:44:27 -05:00
Arik Jones (aider)
eabf1ba23f feat: add files subcommand and refactor rollup functionality 2024-09-19 11:38:09 -05:00
Arik Jones
1e88fae75d docs: Update the readme 2024-09-19 11:08:13 -05:00
Arik Jones
eba453f09e fix: rollup output file name (again) 2024-09-19 11:02:35 -05:00
Arik Jones
d3ba28d03b fix: Output markdown files should end in *.rollup.md 2024-09-19 10:56:00 -05:00
Arik Jones
197f3affc7 fix: Don't use PersistentPreRunE. Caused the actuall runRollup function to never run. 2024-09-19 10:43:23 -05:00
Arik Jones (aider)
7f468a05bd feat: install only Chromium browser 2024-09-17 14:51:09 -05:00
Arik Jones (aider)
4586b5daaa fix: Install Playwright and browsers before initializing 2024-09-17 14:48:15 -05:00
Arik Jones (aider)
056c3e368e fix: Update import and usage of Config type in cmd/root.go 2024-09-16 09:53:44 -05:00
Arik Jones (aider)
21d3e8ee68 fix: Handle missing configuration file for help command 2024-09-16 09:52:48 -05:00
Arik Jones (aider)
efee186ae0 fix: Skip config loading and rollup execution for help command 2024-09-16 09:52:25 -05:00
Arik Jones
41fb9e3fad Correction in web scraping example. 2024-09-14 21:38:17 -05:00
Arik Jones (aider)
6cb2f03d74 feat: Add web scraping functionality and exclusionary CSS paths 2024-09-14 21:26:59 -05:00
Arik Jones
bb12e3d029 fix: Something in root 2024-09-14 21:25:50 -05:00
Arik Jones (aider)
53dcd6eb71 feat: Add support for exclusionary CSS paths in config.go 2024-09-14 20:59:08 -05:00
Arik Jones (aider)
ece9492b30 fix: Remove unused import in cmd/web.go 2024-09-14 20:56:51 -05:00
Arik Jones (aider)
c1755836b5 fix: Move HTML to Markdown conversion to scraper.go 2024-09-14 20:55:35 -05:00
Arik Jones
939cffb55e fix: Simplify sanitizeFilename function 2024-09-14 20:55:34 -05:00
Arik Jones (aider)
b6de9d211b fix: Merge duplicate runWeb function and add missing function definitions 2024-09-14 20:42:10 -05:00
Arik Jones (aider)
a6ebf0062a fix: Add --verbose flag to web subcommand 2024-09-14 20:41:23 -05:00
Arik Jones (aider)
aaff602b3e fix: Use local getFilenameFromContent function instead of undefined scraper.GetFilenameFromContent 2024-09-14 20:38:06 -05:00
Arik Jones (aider)
6f4750c900 fix: Remove references to non-existent CSSLocator field in Config struct 2024-09-14 20:36:31 -05:00
Arik Jones (aider)
52c7de255d feat: Implement scraping of multiple URLs with optional CSS locators and separate output files 2024-09-14 20:35:35 -05:00
Arik Jones (aider)
5264023cba feat: add MIT license 2024-09-14 20:15:05 -05:00
Arik Jones (aider)
87c2a81375 feat: Add README.md 2024-09-14 20:13:00 -05:00
Arik Jones (aider)
b1db362a94 fix: Initialize logger before calling InitPlaywright 2024-09-14 19:59:39 -05:00
Arik Jones (aider)
23508df6f4 feat: Add optional logging to the scraper 2024-09-14 19:59:02 -05:00
Arik Jones
01d6b2f54f fix: Improve page content extraction in scraper 2024-09-14 19:59:01 -05:00
Arik Jones (aider)
3378402fb9 fix: Handle missing content in ProcessHTMLContent 2024-09-14 19:43:58 -05:00
Arik Jones
2ab0d74279 fix: Update scraper to handle empty URLs 2024-09-14 19:42:38 -05:00
Arik Jones (aider)
eaa7135eab feat: Improve content extraction with fallback to body 2024-09-14 17:05:05 -05:00
Arik Jones (aider)
f4c368e112 fix: Update web command to properly handle --exclude flag 2024-09-14 17:02:44 -05:00
Arik Jones (aider)
d80151b9eb fix: reorder flag definitions in cmd/web.go 2024-09-14 17:01:49 -05:00
Arik Jones (aider)
9196708426 fix: Update web command flags 2024-09-14 17:01:17 -05:00