In this section, we present how to use a web crawler within MindsDB.A web crawler is a computer program or automated script that browses the internet and navigates through websites, web pages, and web content to gather data. Within the realm of MindsDB, a web crawler can be employed to harvest data, which can be used to train models,
domain specific chatbots or fine-tune LLMs.
This handler does not require any connection parameters.Here is how to initialize a web crawler:
Copy
Ask AI
CREATE DATABASE my_webWITH ENGINE = 'web';
If you installed MindsDB locally via pip, you need to install all handler dependencies manually. To do so, go to the handler’s folder (mindsdb/integrations/handlers/web_handler) and run this command: pip install -r requirements.txt.