Commit Graph

79 Commits

Author SHA1 Message Date
Arik Jones (aider)
eabf1ba23f feat: add files subcommand and refactor rollup functionality 2024-09-19 11:38:09 -05:00
Arik Jones
1e88fae75d docs: Update the readme 2024-09-19 11:08:13 -05:00
Arik Jones
eba453f09e fix: rollup output file name (again) 2024-09-19 11:02:35 -05:00
Arik Jones
d3ba28d03b fix: Output markdown files should end in *.rollup.md 2024-09-19 10:56:00 -05:00
Arik Jones
197f3affc7 fix: Don't use PersistentPreRunE. Caused the actuall runRollup function to never run. 2024-09-19 10:43:23 -05:00
Arik Jones (aider)
7f468a05bd feat: install only Chromium browser 2024-09-17 14:51:09 -05:00
Arik Jones (aider)
4586b5daaa fix: Install Playwright and browsers before initializing 2024-09-17 14:48:15 -05:00
Arik Jones (aider)
056c3e368e fix: Update import and usage of Config type in cmd/root.go 2024-09-16 09:53:44 -05:00
Arik Jones (aider)
21d3e8ee68 fix: Handle missing configuration file for help command 2024-09-16 09:52:48 -05:00
Arik Jones (aider)
efee186ae0 fix: Skip config loading and rollup execution for help command 2024-09-16 09:52:25 -05:00
Arik Jones
41fb9e3fad Correction in web scraping example. 2024-09-14 21:38:17 -05:00
Arik Jones (aider)
6cb2f03d74 feat: Add web scraping functionality and exclusionary CSS paths 2024-09-14 21:26:59 -05:00
Arik Jones
bb12e3d029 fix: Something in root 2024-09-14 21:25:50 -05:00
Arik Jones (aider)
53dcd6eb71 feat: Add support for exclusionary CSS paths in config.go 2024-09-14 20:59:08 -05:00
Arik Jones (aider)
ece9492b30 fix: Remove unused import in cmd/web.go 2024-09-14 20:56:51 -05:00
Arik Jones (aider)
c1755836b5 fix: Move HTML to Markdown conversion to scraper.go 2024-09-14 20:55:35 -05:00
Arik Jones
939cffb55e fix: Simplify sanitizeFilename function 2024-09-14 20:55:34 -05:00
Arik Jones (aider)
b6de9d211b fix: Merge duplicate runWeb function and add missing function definitions 2024-09-14 20:42:10 -05:00
Arik Jones (aider)
a6ebf0062a fix: Add --verbose flag to web subcommand 2024-09-14 20:41:23 -05:00
Arik Jones (aider)
aaff602b3e fix: Use local getFilenameFromContent function instead of undefined scraper.GetFilenameFromContent 2024-09-14 20:38:06 -05:00
Arik Jones (aider)
6f4750c900 fix: Remove references to non-existent CSSLocator field in Config struct 2024-09-14 20:36:31 -05:00
Arik Jones (aider)
52c7de255d feat: Implement scraping of multiple URLs with optional CSS locators and separate output files 2024-09-14 20:35:35 -05:00
Arik Jones (aider)
5264023cba feat: add MIT license 2024-09-14 20:15:05 -05:00
Arik Jones (aider)
87c2a81375 feat: Add README.md 2024-09-14 20:13:00 -05:00
Arik Jones (aider)
b1db362a94 fix: Initialize logger before calling InitPlaywright 2024-09-14 19:59:39 -05:00
Arik Jones (aider)
23508df6f4 feat: Add optional logging to the scraper 2024-09-14 19:59:02 -05:00
Arik Jones
01d6b2f54f fix: Improve page content extraction in scraper 2024-09-14 19:59:01 -05:00
Arik Jones (aider)
3378402fb9 fix: Handle missing content in ProcessHTMLContent 2024-09-14 19:43:58 -05:00
Arik Jones
2ab0d74279 fix: Update scraper to handle empty URLs 2024-09-14 19:42:38 -05:00
Arik Jones (aider)
eaa7135eab feat: Improve content extraction with fallback to body 2024-09-14 17:05:05 -05:00
Arik Jones (aider)
f4c368e112 fix: Update web command to properly handle --exclude flag 2024-09-14 17:02:44 -05:00
Arik Jones (aider)
d80151b9eb fix: reorder flag definitions in cmd/web.go 2024-09-14 17:01:49 -05:00
Arik Jones (aider)
9196708426 fix: Update web command flags 2024-09-14 17:01:17 -05:00
Arik Jones (aider)
7cdd68d020 feat: Separate include and exclude selectors in web scraper 2024-09-14 16:59:59 -05:00
Arik Jones (aider)
39e06ee9d5 fix: remove space between minus and CSS path in parseSelectors 2024-09-14 16:54:34 -05:00
Arik Jones (aider)
d66fd04016 fix: Use - instead of ! to filter unwanted elements 2024-09-14 16:53:42 -05:00
Arik Jones (aider)
e50484a6fa fix: Remove XPath-related code from cmd/web.go 2024-09-14 16:51:54 -05:00
Arik Jones (aider)
56d5a8a194 refactor: Remove XPath support 2024-09-14 16:51:18 -05:00
Arik Jones (aider)
09f8ed07c2 fix: Remove unused variable excludeXPaths in ExtractContentWithXPath function 2024-09-14 16:50:34 -05:00
Arik Jones (aider)
f1af20e95e feat: Add support for excluding child elements in content extraction 2024-09-14 16:49:32 -05:00
Arik Jones (aider)
d0ee666b07 refactor: Modify scraper to capture only the main content 2024-09-14 15:20:15 -05:00
Arik Jones (aider)
bfd70fd786 fix: Add import for scraper package in cmd/root.go 2024-09-14 15:17:18 -05:00
Arik Jones (aider)
8b85d755af fix: Update Execute function to accept configuration and scraper config 2024-09-14 15:17:00 -05:00
Arik Jones (aider)
9660a12549 fix: remove unused import of "github.com/tnypxl/rollup/internal/config" 2024-09-14 15:16:36 -05:00
Arik Jones (aider)
8e89621ef0 fix: Remove redeclaration of cfg in cmd/web.go 2024-09-14 15:16:11 -05:00
Arik Jones (aider)
595c451ad9 feat: Pass scraper configuration to command execution 2024-09-14 15:15:39 -05:00
Arik Jones (aider)
1a57be80fa fix: Remove print media emulation and improve CSS selector extraction 2024-09-14 15:14:53 -05:00
Arik Jones
a3b23a6d34 ... 2024-09-14 15:11:24 -05:00
Arik Jones (aider)
8932f503c6 feat: Pass configuration to command execution 2024-09-14 15:09:57 -05:00
Arik Jones (aider)
ea12ad631c fix: Fix assignment mismatch in ExtractContentWithCSS function 2024-09-14 14:54:04 -05:00