Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adiciona IA para fazer o scraping #4

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ __pycache__/
*.py[cod]
*$py.class

#PynewFiles
pynews*
format.json
export.py
worker.py

# C extensions
*.so

Expand Down
22 changes: 22 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM ubuntu:22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
python3-pip \
python3.10 \
tzdata && \
ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime && \
dpkg-reconfigure --frontend noninteractive tzdata && \
apt-get clean && rm -rf /var/lib/apt/lists/* &&\
mkdir -p app && \
pip3 install -U crawl4ai && \
playwright install && \
playwright install-deps

WORKDIR /app

COPY ./app/. .
COPY requirements.txt .

RUN pip3 install -r requirements.txt
56 changes: 46 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,62 @@

Este projeto é um scraper (raspador de dados) para verificar as bibliotecas Python que tiveram atualização no último mês e listá-las, facilitando a identificação de bibliotecas com lançamentos de versão major.

Este Scraper utiliza IA para que não seja necessário a sua atualização constante

# Como utilizar

Para criar um ambiente virtual, execute
## Instalação do Docker e Docker Compose

### Ubuntu

1. **Instale o Docker:**

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04

2. **Instale o Docker Compose:**

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-22-04


### Windows

1. **Instale o Docker:**

https://docs.docker.com/desktop/setup/install/windows-install/

2. **Instale o Docker Compose:**

https://docs.docker.com/compose/install/


### Crie uma chave GEMINI API junto a Google

- https://aistudio.google.com/
<p>

Atualize o valor em jeannie.py
```
python -m venv .venv
...
GOOGLE_API_KEY = "<YOUR-API-KEY>"
...
```

Para ativar o ambiente virtual, execute
```
source .venv/bin/activate
## Construa o container
```sh
docker compose build
```

Para instalar as dependências do projeto no ambiente virtual, execute
## Para procurar por releases, execute o comando abaixo e aguarde as instruções
```
pip install -r requirements.txt
docker compose run --rm --remove-orphans pynews python3 /app/app/getNews.py releases
```

Para rodar o scrapper, execute
## Para criar os resumos, execute
```
python getNews.py
docker compose run --rm --remove-orphans pynews python3 /app/app/getNews.py slides
```
<p>
<p>
<p>

Para desativar o ambiente virtual, execute `deactivate`.
## Esse é um script que depende da COHERE AI, essa IA assim como todas as outras, ainda não apresentam comportamento estável, portanto esse script deve ser usado sob supervisão
Empty file added app/__init__.py
Empty file.
173 changes: 173 additions & 0 deletions app/bibliotecas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
bibliotecas = {
"Requests": {
"library_name": "Requests",
"releases_url": "https://pypi.org/project/requests/",
"logo": "https://requests.readthedocs.io/en/latest/_static/requests-sidebar.png",
"repository": "https://github.com/psf/requests",
},
"Scikit-learn": {
"library_name": "Scikit-learn",
"releases_url": "https://pypi.org/project/scikit-learn/",
"logo": "https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png",
"repository": "https://github.com/scikit-learn/scikit-learn",
},
"Numpy": {
"library_name": "Numpy",
"releases_url": "https://pypi.org/project/numpy/",
"logo": "https://numpy.org/devdocs/_static/numpylogo.svg",
"repository": "https://github.com/numpy/numpy",
},
"MatPlotLib": {
"library_name": "MatPlotLib",
"releases_url": "https://pypi.org/project/matplotlib/",
"logo": "https://matplotlib.org/stable/_static/logo_light.svg",
"repository": "https://github.com/matplotlib/matplotlib",
},
"AIOHttp": {
"library_name": "AIOHttp",
"releases_url": "https://pypi.org/project/aiohttp/",
"logo": "https://docs.aiohttp.org/en/stable/_static/aiohttp-plain.svg",
"repository": "https://github.com/aio-libs/aiohttp",
},
"Pandas": {
"library_name": "Pandas",
"releases_url": "https://pypi.org/project/pandas/",
"logo": "https://pandas.pydata.org/static/img/pandas_mark.svg",
"repository": "https://github.com/pandas-dev/pandas",
},
"FastAPI": {
"library_name": "FastAPI",
"releases_url": "https://pypi.org/project/fastapi/",
"logo": "https://camo.githubusercontent.com/4ebb06d037b495f2c4c67e0ee4599f747e94e6323ece758a7da27fbbcb411250/68747470733a2f2f666173746170692e7469616e676f6c6f2e636f6d2f696d672f6c6f676f2d6d617267696e2f6c6f676f2d7465616c2e706e67",
"repository": "https://github.com/fastapi/fastapi",
},
"Django": {
"library_name": "Django",
"releases_url": "https://pypi.org/project/Django/",
"logo": "https://www.djangoproject.com/m/img/logos/django-logo-positive.png",
"repository": "https://github.com/django/django",
},
"Seaborn": {
"library_name": "Seaborn",
"releases_url": "https://pypi.org/project/seaborn/",
"logo": "https://seaborn.pydata.org/_images/logo-wide-lightbg.svg",
"repository": "https://github.com/mwaskom/seaborn",
},
"TensorFlow": {
"library_name": "TensorFlow",
"releases_url": "https://pypi.org/project/tensorflow/",
"logo": "https://www.tensorflow.org/images/tf_logo_social.png",
"repository": "https://github.com/tensorflow/tensorflow",
},
"Keras": {
"library_name": "Keras",
"releases_url": "https://pypi.org/project/keras/",
"logo": "https://keras.io/img/logo.png",
"repository": "https://github.com/keras-team/keras",
},
"PyTorch": {
"library_name": "PyTorch",
"releases_url": "https://pypi.org/project/torch/",
"logo": "https://pytorch.org/assets/images/pytorch-logo.png",
"repository": "https://github.com/pytorch/pytorch",
},
"SQLAlchemy": {
"library_name": "SQLAlchemy",
"releases_url": "https://pypi.org/project/SQLAlchemy/",
"logo": "https://www.sqlalchemy.org/img/sqla_logo.png",
"repository": "https://github.com/sqlalchemy/sqlalchemy",
},
"BeaultifulSoup": {
"library_name": "BeaultifulSoup",
"releases_url": "https://pypi.org/project/beautifulsoup4/",
"logo": "https://www.crummy.com/software/BeautifulSoup/10.1.jpg",
"repository": None,
},
"LangChain": {
"library_name": "LangChain",
"releases_url": "https://pypi.org/project/langchain/",
"logo": "https://python.langchain.com/img/brand/wordmark-dark.png",
"repository": "https://github.com/langchain-ai/langchain",
},
"CrewAI": {
"library_name": "CrewAI",
"releases_url": "https://pypi.org/project/crewai/",
"logo": "https://cdn.prod.website-files.com/66cf2bfc3ed15b02da0ca770/66d07240057721394308addd_Logo%20(1).svg",
"repository": "https://github.com/crewAIInc/crewAI",
},
"Flask": {
"library_name": "Flask",
"releases_url": "https://pypi.org/project/Flask/",
"logo": "https://flask.palletsprojects.com/en/stable/_static/flask-vertical.png",
"repository": "https://github.com/pallets/flask",
},
"Pygame": {
"library_name": "Pygame",
"releases_url": "https://pypi.org/project/pygame/",
"logo": "https://www.pygame.org/images/logo_lofi.png",
"repository": "https://github.com/pygame/pygame",
},
"Thinker": {
"library_name": "Thinker",
"releases_url": "https://pypi.org/project/thinker/",
"logo": "https://keras.io/img/logo.png",
"repository": "https://github.com/mehmetkose/thinker",
},
"Plotly": {
"library_name": "Plotly",
"releases_url": "https://pypi.org/project/plotly/",
"logo": "https://plotly.com/static/img/logos/plotly-logomark.svg",
"repository": "https://github.com/plotly/plotly.py",
},
"MlForecast": {
"library_name": "MlForecast",
"releases_url": "https://pypi.org/project/mlforecast/",
"logo": "https://raw.githubusercontent.com/Nixtla/mlforecast/main/nbs/figs/logo.png",
"repository": "https://github.com/Nixtla/mlforecast",
},
"GeoPandas": {
"library_name": "GeoPandas",
"releases_url": "https://pypi.org/project/geopandas/",
"logo": "https://geopandas.org/en/stable/_static/geopandas_logo_web.svg",
"repository": "https://github.com/geopandas/geopandas",
},
"AirFlow": {
"library_name": "AirFlow",
"releases_url": "https://pypi.org/project/apache-airflow/",
"logo": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/de/AirflowLogo.png/800px-AirflowLogo.png?20191014185111",
"repository": "https://github.com/apache/airflow",
},
"PySpark": {
"library_name": "PySpark",
"releases_url": "https://pypi.org/project/pyspark/",
"logo": "https://spark.apache.org/docs/latest/api/python/_static/spark-logo-reverse.png",
"repository": "https://github.com/apache/spark/tree/master/python",
},
"Gym": {
"library_name": "Gym",
"releases_url": "https://pypi.org/project/gym/",
"logo": "https://www.gymlibrary.dev/_static/img/gym_logo_black.svg",
"repository": "https://github.com/Farama-Foundation/Gymnasium",
},
"HyperOpt": {
"library_name": "HyperOpt",
"releases_url": "https://pypi.org/project/hyperopt/",
"logo": "https://camo.githubusercontent.com/d9cabe82cdc7bff598f84d61b0a8921cd5c3ceb0716b03399fc31db1a2a23182/68747470733a2f2f692e706f7374696d672e63632f54506d66665772702f68797065726f70742d6e65772e706e67",
},
"Streamlit": {
"library_name": "Streamlit",
"releases_url": "https://pypi.org/project/streamlit/",
"logo": "https://streamlit.io/images/brand/streamlit-mark-color.png",
},
"Crawl4ai": {
"library_name": "Crawl4ai",
"releases_url": "https://crawl4ai.com/mkdocs/blog/",
"logo": "https://star-history.com/#unclecode/crawl4ai&Date",
},
"ScanAPI": {
"library_name": "ScanAPI",
"releases_url": "https://pypi.org/project/scanapi/",
"logo": "https://avatars.githubusercontent.com/u/59395469?s=200&v=4",
"repository": "https://github.com/scanapi/scanapi",
},
}
1 change: 1 addition & 0 deletions app/cacheVariables.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pynews = {}
Loading