-
Notifications
You must be signed in to change notification settings - Fork 49
Description
I wanted to try out this package for an internal RAG presentation based on some of our company website data. Our website has the typical structure: header, main content and footer. Most of the links are in the header and footer part, but if I set target_content
to only the main
container, no links are getting found.
Is there a way to have all URLs collected outside of the container i ultimately want to parse into markdown? When i don't specify the target i am getting quite a mess of navigation and footer link data that repeats on every page (which would have to be cleaned up for every single page)..
Thanks in advance!
Edit: I think it was actually a combination of the domain, base path flags and the target content that returns zero results. once I removed all the extra flags, the target content worked as advertised.