Commit Graph

14 Commits

Author SHA1 Message Date
Arik Jones (aider)
3378402fb9 fix: Handle missing content in ProcessHTMLContent 2024-09-14 19:43:58 -05:00
Arik Jones
2ab0d74279 fix: Update scraper to handle empty URLs 2024-09-14 19:42:38 -05:00
Arik Jones (aider)
eaa7135eab feat: Improve content extraction with fallback to body 2024-09-14 17:05:05 -05:00
Arik Jones (aider)
7cdd68d020 feat: Separate include and exclude selectors in web scraper 2024-09-14 16:59:59 -05:00
Arik Jones (aider)
39e06ee9d5 fix: remove space between minus and CSS path in parseSelectors 2024-09-14 16:54:34 -05:00
Arik Jones (aider)
d66fd04016 fix: Use - instead of ! to filter unwanted elements 2024-09-14 16:53:42 -05:00
Arik Jones (aider)
56d5a8a194 refactor: Remove XPath support 2024-09-14 16:51:18 -05:00
Arik Jones (aider)
09f8ed07c2 fix: Remove unused variable excludeXPaths in ExtractContentWithXPath function 2024-09-14 16:50:34 -05:00
Arik Jones (aider)
f1af20e95e feat: Add support for excluding child elements in content extraction 2024-09-14 16:49:32 -05:00
Arik Jones (aider)
d0ee666b07 refactor: Modify scraper to capture only the main content 2024-09-14 15:20:15 -05:00
Arik Jones (aider)
1a57be80fa fix: Remove print media emulation and improve CSS selector extraction 2024-09-14 15:14:53 -05:00
Arik Jones (aider)
ea12ad631c fix: Fix assignment mismatch in ExtractContentWithCSS function 2024-09-14 14:54:04 -05:00
Arik Jones (aider)
885f3fc2b8 feat: Add missing scraper functions 2024-09-14 14:52:45 -05:00
Arik Jones
0163c4e504 Adds a configuration layer for use rollup.yml which may be preferred over CLI flags. 2024-09-05 23:41:39 -05:00