Arik Jones (aider)
|
d5a94f5468
|
fix: remove indentation while preserving HTML structure in ExtractContentWithCSS
|
2024-09-22 17:00:16 -05:00 |
|
Arik Jones (aider)
|
59994c085c
|
fix: improve file ignore logic and preserve newlines in extracted content
|
2024-09-22 16:58:53 -05:00 |
|
Arik Jones (aider)
|
364b185269
|
fix: resolve test failures in TestRunRollup, TestExtractContentWithCSS, and TestExtractLinks
|
2024-09-21 16:04:20 -05:00 |
|
Arik Jones (aider)
|
952c2dda02
|
refactor: update browser initialization in scraper tests
|
2024-09-21 16:01:51 -05:00 |
|
Arik Jones
|
73116e8d82
|
Fix logging and other issues from preventing scraping
|
2024-09-21 15:54:33 -05:00 |
|
Arik Jones
|
160a15dbb1
|
fix: Use logger instead of log. Move web subcommand initialization to root.go
|
2024-09-19 11:44:27 -05:00 |
|
Arik Jones (aider)
|
7f468a05bd
|
feat: install only Chromium browser
|
2024-09-17 14:51:09 -05:00 |
|
Arik Jones (aider)
|
4586b5daaa
|
fix: Install Playwright and browsers before initializing
|
2024-09-17 14:48:15 -05:00 |
|
Arik Jones (aider)
|
53dcd6eb71
|
feat: Add support for exclusionary CSS paths in config.go
|
2024-09-14 20:59:08 -05:00 |
|
Arik Jones (aider)
|
c1755836b5
|
fix: Move HTML to Markdown conversion to scraper.go
|
2024-09-14 20:55:35 -05:00 |
|
Arik Jones (aider)
|
6f4750c900
|
fix: Remove references to non-existent CSSLocator field in Config struct
|
2024-09-14 20:36:31 -05:00 |
|
Arik Jones (aider)
|
52c7de255d
|
feat: Implement scraping of multiple URLs with optional CSS locators and separate output files
|
2024-09-14 20:35:35 -05:00 |
|
Arik Jones (aider)
|
23508df6f4
|
feat: Add optional logging to the scraper
|
2024-09-14 19:59:02 -05:00 |
|
Arik Jones
|
01d6b2f54f
|
fix: Improve page content extraction in scraper
|
2024-09-14 19:59:01 -05:00 |
|
Arik Jones (aider)
|
3378402fb9
|
fix: Handle missing content in ProcessHTMLContent
|
2024-09-14 19:43:58 -05:00 |
|
Arik Jones
|
2ab0d74279
|
fix: Update scraper to handle empty URLs
|
2024-09-14 19:42:38 -05:00 |
|
Arik Jones (aider)
|
eaa7135eab
|
feat: Improve content extraction with fallback to body
|
2024-09-14 17:05:05 -05:00 |
|
Arik Jones (aider)
|
7cdd68d020
|
feat: Separate include and exclude selectors in web scraper
|
2024-09-14 16:59:59 -05:00 |
|
Arik Jones (aider)
|
39e06ee9d5
|
fix: remove space between minus and CSS path in parseSelectors
|
2024-09-14 16:54:34 -05:00 |
|
Arik Jones (aider)
|
d66fd04016
|
fix: Use - instead of ! to filter unwanted elements
|
2024-09-14 16:53:42 -05:00 |
|
Arik Jones (aider)
|
56d5a8a194
|
refactor: Remove XPath support
|
2024-09-14 16:51:18 -05:00 |
|
Arik Jones (aider)
|
09f8ed07c2
|
fix: Remove unused variable excludeXPaths in ExtractContentWithXPath function
|
2024-09-14 16:50:34 -05:00 |
|
Arik Jones (aider)
|
f1af20e95e
|
feat: Add support for excluding child elements in content extraction
|
2024-09-14 16:49:32 -05:00 |
|
Arik Jones (aider)
|
d0ee666b07
|
refactor: Modify scraper to capture only the main content
|
2024-09-14 15:20:15 -05:00 |
|
Arik Jones (aider)
|
1a57be80fa
|
fix: Remove print media emulation and improve CSS selector extraction
|
2024-09-14 15:14:53 -05:00 |
|
Arik Jones (aider)
|
ea12ad631c
|
fix: Fix assignment mismatch in ExtractContentWithCSS function
|
2024-09-14 14:54:04 -05:00 |
|
Arik Jones (aider)
|
885f3fc2b8
|
feat: Add missing scraper functions
|
2024-09-14 14:52:45 -05:00 |
|
Arik Jones
|
0163c4e504
|
Adds a configuration layer for use rollup.yml which may be preferred over CLI flags.
|
2024-09-05 23:41:39 -05:00 |
|