You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 25, 2024. It is now read-only.
New flags:
- `only-main-content`: default to false, if enabled it will remove
script, style (and others) tags from the emitted document. This is
particalury helpful in order to verify actual semantic changes to the
pages, not related to sldf (script versioning, cache busting, etc)
- `emit-content-diff`: list, default to all the content diff. You can
filter the content diff you want the source to emit, if available. For
example, to not emit content_unchanged, you can set `emit-content-diff:
['new', 'content_diff']`
Copy file name to clipboardExpand all lines: langstream-agents/langstream-agent-webcrawler/src/main/java/ai/langstream/agents/webcrawler/WebCrawlerSource.java
Copy file name to clipboardExpand all lines: langstream-agents/langstream-agent-webcrawler/src/main/java/ai/langstream/agents/webcrawler/crawler/WebCrawler.java
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -270,6 +270,12 @@ public boolean runCycle() throws Exception {
Copy file name to clipboardExpand all lines: langstream-agents/langstream-agent-webcrawler/src/main/java/ai/langstream/agents/webcrawler/crawler/WebCrawlerConfiguration.java
+22-2Lines changed: 22 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -40,8 +40,28 @@ public class WebCrawlerConfiguration {
0 commit comments