Numbeo Scraper

Table of Contents

About The Project
Installation
Features
Examples
Running Tests
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

The largest crowdsourced global database of quality of life data, including housing indicators, perceived crime rates, healthcare quality, transportation quality, and other statistics, is Numbeo. In order to save time when searching for information about the quality of life in a particular country or city, the project's goal is to use web scraping frameworks (in this case, the BeautifulSoup4 library) to extract data from Numbeo's website.

(back to top)

Installation

To install this package, first clone the repository to the directory of your choice using the following command:

git clone https://github.com/rafaelgreca/numbeo-scraper.git

Using Virtual Environment

Create a virtual environment (ideally using conda) and install the requirements using the following command:

conda create --name numbeo-scraper python=3.10.16 
conda activate numbeo-scraper
pip install -r requirements.txt

Using Docker

Build the Docker image using the following command:

sudo docker build -f Dockerfile -t numbeo-scraper . --no-cache

Run the Docker container using the following command:

sudo docker run -it --name numbeo-scraper-container numbeo-scraper

Finally, run the following command inside the container:

python3 -m <YOUR_PYTHON_FILE_LOCATION>

Example (this is the same command as used with the virtual environment approach):

python3 -m examples.by_country.get_quality_of_life_data

(back to top)

Features

Cost of living index by country (check an example here) or by city (check an example here).
Property price/investment index by country (check an example here) or by city (check an example here).
Quality of life index by country (check an example here) or by city (check an example here).
Crime index by country (check an example here) or by city (check an example here).
Health care index by country (check an example here) or by city (check an example here).
Pollution index by country (check an example here) or by city (check an example here).
Traffic index by country (check an example here) or by city (check an example here).
Historical data in a country (check an example here).

Examples

You can pass the variables that will be used to collect the desired data by creating a YAML file (such as the config.yaml file located in the root folder) and creating a piece of code like the one below (and saving in a Python file, obviously):

from pathlib import Path

from src.core.utils import read_yaml_credentials_file
from src.schema.input import Input
from src.core.scraper import NumbeoScraper

if __name__ == "__main__":
    # reading the YAML file
    config = Input(
        **read_yaml_credentials_file(
            file_path=Path(__file__).resolve().parents[1], # the folder where the config file is located
            file_name="config.yaml", # the configuration file name
        )
    )

    scraper = NumbeoScraper(
        config=config,
    )
    dataframes = scraper.scrap()  # will return a list of tuples (each category will be saved separately)
                                  # where the first index is the name of the dataframe
                                  # and the second one is the collected data 

    dataframe_name, data = dataframes[0]  # the name is used to identify the data

    print(f"\nDataframe '{dataframe_name}' has a shape of {data.shape}.")
    print(f"The first five rows of the dataset:\n{data.head(5)}\n")

Or you can pass the values directly, like this:

from src.schema.input import Input
from src.core.scraper import NumbeoScraper

if __name__ == "__main__":
    config = Input(
        categories="historical-data",
        years=2021,
        mode="country",
        currency="EUR",
        historical_items=[
          '1 Pair of Jeans (Levis 501 Or Similar)',
          'Banana (1kg)'
        ],
        countries=[
          'China',
          'France',
          'United States'
        ],
    )

    scraper = NumbeoScraper(
        config=config,
    )
    dataframes = scraper.scrap()  # will return a list of tuples (each category will be saved separately)
                                  # where the first index is the name of the dataframe
                                  # and the second one is the collected data 

    dataframe_name, data = dataframes[0]  # the name is used to identify the data

    print(f"\nDataframe '{dataframe_name}' has a shape of {data.shape}.")
    print(f"The first five rows of the dataset:\n{data.head(5)}\n")

Available parameters that can/must be used:

categories (can be a list of strings or just a string, mandatory): Which type of data will be collected. You can see the available categories here.
years (can be a list of integers or just an integer, mandatory): Which years the data will be extracted from. You can see the available years here.
mode (a string, mandatory): Whether the data will be collected by country or by city. You can see the available modes here.
currency (a string, optional): Which currency the values will be displayed. You can see the available currencies here. This parameter is optional; however it must be used when the chosen category is historical-data with mode country or cost-of-living or property-investment with mode city.
historical_items (can be a list of strings or just a string, optional): Which items the historical data will be extracted from. You can see the available items here. This parameter is optional, however it must be used when the chosen category is historical-data with mode country.
countries (can be a list of strings or just a string, optional): Which countries the data will be extracted from. You can see the available countries here.
cities (can be a list of strings or just a string, mandatory): Which cities will the data be extracted from. This parameter is mandatory when the mode city is chosen.

Check the examples folder to see more examples of how to use this library.

(back to top)

Running Tests

Run the following command on the root folder:

python3 -m unittest discover -p 'test_*.py'

(back to top)

Roadmap

Add a feature to get the food prices by country or by city.
Fix logging (currently it's not saving the logs into a file, but rather showing them directly in the terminal).
Improve testing cases, especially to validate the parameters values and typing.
Test the code using Docker.

(back to top)

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Rafael Greca Vieira - GitHub - LinkedIn - [email protected]

(back to top)

Acknowledgments

In addition to helping people from all over the world plan their travels and find a new place to call home, Numbeo is the world's largest cost-of-living crowdsourced global database, and for that, I want to express my profound gratitude to everyone who works behind the scenes to make it possible.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
examples		examples
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Numbeo Scraper

About The Project

Installation

Using Virtual Environment

Using Docker

Features

Examples

Running Tests

Roadmap

Contributing

License

Contact

Acknowledgments

About

Releases

Packages

Languages

License

rafaelgreca/numbeo-scraper

Folders and files

Latest commit

History

Repository files navigation

Numbeo Scraper

About The Project

Installation

Using Virtual Environment

Using Docker

Features

Examples

Running Tests

Roadmap

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages