Arik Jones (aider)
|
efee186ae0
|
fix: Skip config loading and rollup execution for help command
|
2024-09-16 09:52:25 -05:00 |
|
Arik Jones
|
41fb9e3fad
|
Correction in web scraping example.
|
2024-09-14 21:38:17 -05:00 |
|
Arik Jones (aider)
|
6cb2f03d74
|
feat: Add web scraping functionality and exclusionary CSS paths
|
2024-09-14 21:26:59 -05:00 |
|
Arik Jones
|
bb12e3d029
|
fix: Something in root
|
2024-09-14 21:25:50 -05:00 |
|
Arik Jones (aider)
|
53dcd6eb71
|
feat: Add support for exclusionary CSS paths in config.go
|
2024-09-14 20:59:08 -05:00 |
|
Arik Jones (aider)
|
ece9492b30
|
fix: Remove unused import in cmd/web.go
|
2024-09-14 20:56:51 -05:00 |
|
Arik Jones (aider)
|
c1755836b5
|
fix: Move HTML to Markdown conversion to scraper.go
|
2024-09-14 20:55:35 -05:00 |
|
Arik Jones
|
939cffb55e
|
fix: Simplify sanitizeFilename function
|
2024-09-14 20:55:34 -05:00 |
|
Arik Jones (aider)
|
b6de9d211b
|
fix: Merge duplicate runWeb function and add missing function definitions
|
2024-09-14 20:42:10 -05:00 |
|
Arik Jones (aider)
|
a6ebf0062a
|
fix: Add --verbose flag to web subcommand
|
2024-09-14 20:41:23 -05:00 |
|
Arik Jones (aider)
|
aaff602b3e
|
fix: Use local getFilenameFromContent function instead of undefined scraper.GetFilenameFromContent
|
2024-09-14 20:38:06 -05:00 |
|
Arik Jones (aider)
|
6f4750c900
|
fix: Remove references to non-existent CSSLocator field in Config struct
|
2024-09-14 20:36:31 -05:00 |
|
Arik Jones (aider)
|
52c7de255d
|
feat: Implement scraping of multiple URLs with optional CSS locators and separate output files
|
2024-09-14 20:35:35 -05:00 |
|
Arik Jones (aider)
|
5264023cba
|
feat: add MIT license
|
2024-09-14 20:15:05 -05:00 |
|
Arik Jones (aider)
|
87c2a81375
|
feat: Add README.md
|
2024-09-14 20:13:00 -05:00 |
|
Arik Jones (aider)
|
b1db362a94
|
fix: Initialize logger before calling InitPlaywright
|
2024-09-14 19:59:39 -05:00 |
|
Arik Jones (aider)
|
23508df6f4
|
feat: Add optional logging to the scraper
|
2024-09-14 19:59:02 -05:00 |
|
Arik Jones
|
01d6b2f54f
|
fix: Improve page content extraction in scraper
|
2024-09-14 19:59:01 -05:00 |
|
Arik Jones (aider)
|
3378402fb9
|
fix: Handle missing content in ProcessHTMLContent
|
2024-09-14 19:43:58 -05:00 |
|
Arik Jones
|
2ab0d74279
|
fix: Update scraper to handle empty URLs
|
2024-09-14 19:42:38 -05:00 |
|
Arik Jones (aider)
|
eaa7135eab
|
feat: Improve content extraction with fallback to body
|
2024-09-14 17:05:05 -05:00 |
|
Arik Jones (aider)
|
f4c368e112
|
fix: Update web command to properly handle --exclude flag
|
2024-09-14 17:02:44 -05:00 |
|
Arik Jones (aider)
|
d80151b9eb
|
fix: reorder flag definitions in cmd/web.go
|
2024-09-14 17:01:49 -05:00 |
|
Arik Jones (aider)
|
9196708426
|
fix: Update web command flags
|
2024-09-14 17:01:17 -05:00 |
|
Arik Jones (aider)
|
7cdd68d020
|
feat: Separate include and exclude selectors in web scraper
|
2024-09-14 16:59:59 -05:00 |
|
Arik Jones (aider)
|
39e06ee9d5
|
fix: remove space between minus and CSS path in parseSelectors
|
2024-09-14 16:54:34 -05:00 |
|
Arik Jones (aider)
|
d66fd04016
|
fix: Use - instead of ! to filter unwanted elements
|
2024-09-14 16:53:42 -05:00 |
|
Arik Jones (aider)
|
e50484a6fa
|
fix: Remove XPath-related code from cmd/web.go
|
2024-09-14 16:51:54 -05:00 |
|
Arik Jones (aider)
|
56d5a8a194
|
refactor: Remove XPath support
|
2024-09-14 16:51:18 -05:00 |
|
Arik Jones (aider)
|
09f8ed07c2
|
fix: Remove unused variable excludeXPaths in ExtractContentWithXPath function
|
2024-09-14 16:50:34 -05:00 |
|
Arik Jones (aider)
|
f1af20e95e
|
feat: Add support for excluding child elements in content extraction
|
2024-09-14 16:49:32 -05:00 |
|
Arik Jones (aider)
|
d0ee666b07
|
refactor: Modify scraper to capture only the main content
|
2024-09-14 15:20:15 -05:00 |
|
Arik Jones (aider)
|
bfd70fd786
|
fix: Add import for scraper package in cmd/root.go
|
2024-09-14 15:17:18 -05:00 |
|
Arik Jones (aider)
|
8b85d755af
|
fix: Update Execute function to accept configuration and scraper config
|
2024-09-14 15:17:00 -05:00 |
|
Arik Jones (aider)
|
9660a12549
|
fix: remove unused import of "github.com/tnypxl/rollup/internal/config"
|
2024-09-14 15:16:36 -05:00 |
|
Arik Jones (aider)
|
8e89621ef0
|
fix: Remove redeclaration of cfg in cmd/web.go
|
2024-09-14 15:16:11 -05:00 |
|
Arik Jones (aider)
|
595c451ad9
|
feat: Pass scraper configuration to command execution
|
2024-09-14 15:15:39 -05:00 |
|
Arik Jones (aider)
|
1a57be80fa
|
fix: Remove print media emulation and improve CSS selector extraction
|
2024-09-14 15:14:53 -05:00 |
|
Arik Jones
|
a3b23a6d34
|
...
|
2024-09-14 15:11:24 -05:00 |
|
Arik Jones (aider)
|
8932f503c6
|
feat: Pass configuration to command execution
|
2024-09-14 15:09:57 -05:00 |
|
Arik Jones (aider)
|
ea12ad631c
|
fix: Fix assignment mismatch in ExtractContentWithCSS function
|
2024-09-14 14:54:04 -05:00 |
|
Arik Jones (aider)
|
885f3fc2b8
|
feat: Add missing scraper functions
|
2024-09-14 14:52:45 -05:00 |
|
Arik Jones
|
3390606916
|
feat: Add support for time package in web.go
|
2024-09-14 14:52:44 -05:00 |
|
Arik Jones (aider)
|
50c9e7898d
|
feat: Implement recursive web scraping and content extraction
|
2024-09-14 14:46:34 -05:00 |
|
Arik Jones
|
cf99bd8bf1
|
feat: Implement web command functionality
|
2024-09-14 14:46:31 -05:00 |
|
Arik Jones (aider)
|
d74213e4ff
|
fix: resolve build errors in cmd/web.go
|
2024-09-14 14:43:18 -05:00 |
|
Arik Jones (aider)
|
0494d9433f
|
feat: Add depth, CSS, and XPath options to web command
|
2024-09-14 14:42:21 -05:00 |
|
Arik Jones (aider)
|
514bcacd8a
|
feat: Implement recursive web scraping with configurable depth and content extraction
|
2024-09-14 14:41:54 -05:00 |
|
Arik Jones
|
0163c4e504
|
Adds a configuration layer for use rollup.yml which may be preferred over CLI flags.
|
2024-09-05 23:41:39 -05:00 |
|
Arik Jones (aider)
|
f376f186c2
|
fix: Update cmd/root.go to use the correct field name for ignore patterns
|
2024-09-05 23:08:38 -05:00 |
|