estela CLI Quickstart
Getting Help
To see all available commands, run:
$ estela
For help on a specific command, append the -h
or --help
flag, e.g.:
$ estela create job --help
Basic Usage
To start using the estela CLI with estela, first, you need to log in:
$ estela login
estela CLI will prompt for your credentials. You should see the following output:
$ estela login
Host [http://localhost]:
Username: admin
Password:
Successful login. API Token stored in ~/.estela.yaml.
This will save your estela API key to the file ~/.estela.yaml
, and it is
needed to access projects associated with your account.
If you have installed estela locally run the following command to obtain the host of the estela api:To make this command work, you should run `minikube tunnel`.$ kubectl get service estela-django-api-service -o custom-columns=':status.loadBalancer.ingress[0].ip' \ | tr -d '[:space:]' \ | paste -d "/" <(echo "http:/") -
Note: You can use the superuser credentials that you set with `make createsuperuser` to log in.
Creating a project
In estela, a project is an identifier that groups spiders. It is linked to the Docker image of a Scrapy project. A project's spiders are extracted automatically from the Scrapy project.
To create a project, use the command estela create project <project_name>
, which on success,
should return a message like the following:
$ estela create project proj_test
project/proj_test created.
Hint: Run 'estela init 23ea584d-f39c-85bd-74c1-9b725ffcab1d7' to activate this project
With this, we have created a project in estela. Note the hint in the last line of the output. It shows us the ID of the project we have just created in UUID format.
A project cannot run spiders if it is not linked to the Docker image of a Scrapy Project.
Linking an Scrapy project
To link a project, navigate to a Scrapy project with the following structure:
scrapy_project_dir
----scrapy_project/
----spiders/
----downloader.py
----items.py
----middlewares.py
----pipelines.py
----settings.py
----scrapy.cfg
----requirements.txt
Then, run the suggested command to activate the project:
$ estela init 23ea584d-f39c-85bd-74c1-9b725ffcab1d7
Linking a Requests Project
If you are using a Requests project instead of a Scrapy project, you can link it to Estela by following these steps:
-
Ensure you have created a Requests project using Estela Requests. Refer to the Estela Requests documentation for instructions on creating a project.
-
Navigate to the root directory of your Requests project.
-
Run the following command to activate the project, replacing
23ea584d-f39c-85bd-74c1-9b725ffcab1d7
with the ID of your Estela project:
bash
$ estela init 23ea584d-f39c-85bd-74c1-9b725ffcab1d7 -p requests
This command links your Requests project to the corresponding Estela project, allowing you to utilize Estela's features for managing and running your project.
In order to be discoverable, **spiders should reside in the project's root directory**. However, please note that this will be enhanced in the future to provide greater flexibility.
This will create the files .estela/Dockerfile-estela.yaml
and estela.yaml
in your project directory. estela.yaml
contains the project ID and the Docker
image's name in the AWS registry. This file will also
configure your project, allowing you to
change the Python version, requirements file path, and files to ignore when
deploying (like your virtual environment).
Alternatively, suppose you created the project via the web interface.
In that case, you can directly use the estela init <project_id>
command with
the project ID that you can find on the project detail page.
We have successfully linked our estela project with our Scrapy project.
Deploying a project
This is a simple and essential step. Once the estela and Scrapy projects are linked, we will proceed to build the Docker image and upload it to the AWS registry. This whole process will be done automatically and scheduled by the API with the command:
$ estela deploy
You must run this command in the root directory of your Scrapy project (where the estela.yaml
file is).
This will verify whether any changes to the Dockerfile are needed, caused by making changes in the estela.yaml
file.
Then, it will zip our Scrapy project and upload it to the API, which will take care of the rest
of the process.
$ estela deploy
.estela/Dockerfile-estela not changes to update.
✅ Project uploaded successfully. Deploy 19 underway.
After the deployment is complete, you can see the spiders in your project with
$ estela list spider
NAME SID
quotes 101
And you can create jobs and cronjobs using the estela CLI with estela create job <SID>
and estela create cronjob <CRONTAB_SCHEDULE> <SID>
.
You can see the list of jobs that have run or are running for a spider with:
$ estela list job <SID>
JID STATUS TAGS ARGS ENV VARS CREATED
1943 Completed 2022-03-18 14:40
1850 Completed 2022-03-10 14:14
You can get the scraped items even while the spider is running by supplying the job ID:
$ estela data <JID> <SID>
✅ Data retrieved succesfully.
✅ Data saved succesfully.
This will save the data in a directory project_data/
in JSON format. You can also retrieve
the data in CSV format by adding the option --format csv
.