Skip to content

Error and no URLs foundΒ #19

@umasse

Description

@umasse

This project looks good, but somehow I can't get it to work at all:

markdown-crawler --debug -b test "https://www.hooplaimpro.com/improv-encyclopedia.html"

DEBUG:markdown_crawler:🐞 Debugging enabled
INFO:markdown_crawler:πŸ•ΈοΈ Crawling https://www.hooplaimpro.com/improv-encyclopedia.html at ⏬ depth 3 with 🧡 5 threads
DEBUG:markdown_crawler:Crawling: https://www.hooplaimpro.com/improv-encyclopedia.html
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.hooplaimpro.com:443
DEBUG:markdown_crawler:Started thread 1 of 5
DEBUG:markdown_crawler:Started thread 2 of 5
DEBUG:markdown_crawler:Started thread 3 of 5
DEBUG:markdown_crawler:Started thread 4 of 5
DEBUG:markdown_crawler:Started thread 5 of 5
DEBUG:urllib3.connectionpool:https://www.hooplaimpro.com:443 "GET /improv-encyclopedia.html HTTP/1.1" 200 None
INFO:markdown_crawler:Created πŸ“ improv-encyclopedia-html.md
/home/urko/.virtualenvs/markdown-crawler/lib/python3.12/site-packages/markdown_crawler/__init__.py:201: UserWarning: Ignoring nested list ['body'] to avoid the possibility of infinite recursion.
  for target in soup.find_all(target_links):
DEBUG:markdown_crawler:Found 0 child URLs
INFO:markdown_crawler:🏁 All threads have finished

A different crawler was able to follow all the links in that page, no problem.
What am I missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions