Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adiciona IA para fazer o scraping #4

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Este projeto é um scraper (raspador de dados) para verificar as bibliotecas Python que tiveram atualização no último mês e listá-las, facilitando a identificação de bibliotecas com lançamentos de versão major.

Este Scraper utiliza IA para que não seja necessário a sua atualização constante

# Como utilizar

Para criar um ambiente virtual, execute
Expand All @@ -19,9 +21,41 @@ Para instalar as dependências do projeto no ambiente virtual, execute
pip install -r requirements.txt
```

Para rodar o scrapper, execute
Para completar a instalação do Crawl4AI
```
# Install the package
pip install -U crawl4ai

# Run post-installation setup
crawl4ai-setup

# Verify your installation
crawl4ai-doctor
```
Crie uma chave junto a COHERE
- https://docs.cohere.com/
<p>

Atualize o valor em prompt.py
```
...
os.environ["COHERE_API_KEY"] = "<YOUR-API-KEY>"
...
```
Instale o Thinker
```cmd
sudo apt-get install python3-tk
```

Para procurar por releases, execute o comando abaixo e aguarde as instruções
```
python getNews.py
python getNews.py releases
```

Para criar os resumos, execute
```
python getNews.py slides
```


Para desativar o ambiente virtual, execute `deactivate`.
13 changes: 0 additions & 13 deletions bibliotecas.list

This file was deleted.

142 changes: 142 additions & 0 deletions bibliotecas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
bibliotecas = {
"Requests": {
"library_name": "Requests",
"releases_url": "https://pypi.org/project/requests/",
"logo": "https://requests.readthedocs.io/en/latest/_static/requests-sidebar.png",
},
"Scikit-learn": {
"library_name": "Scikit-learn",
"releases_url": "https://pypi.org/project/scikit-learn/",
"logo": "https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png",
},
"Numpy": {
"library_name": "Numpy",
"releases_url": "https://pypi.org/project/numpy/",
"logo": "https://numpy.org/devdocs/_static/numpylogo.svg",
},
"MatPlotLib": {
"library_name": "MatPlotLib",
"releases_url": "https://pypi.org/project/matplotlib/",
"logo": "https://matplotlib.org/stable/_static/logo_light.svg",
},
"AIOHttp": {
"library_name": "AIOHttp",
"releases_url": "https://pypi.org/project/aiohttp/",
"logo": "https://docs.aiohttp.org/en/stable/_static/aiohttp-plain.svg",
},
"Pandas": {
"library_name": "Pandas",
"releases_url": "https://pypi.org/project/pandas/",
"logo": "https://pandas.pydata.org/static/img/pandas_mark.svg",
},
"FastAPI": {
"library_name": "FastAPI",
"releases_url": "https://pypi.org/project/fastapi/",
"logo": "https://fastapi.tiangolo.com/img/icon.png",
},
"Django": {
"library_name": "Django",
"releases_url": "https://pypi.org/project/Django/",
"logo": "https://static.djangoproject.com/img/logos/django-logo-negative.png",
},
"Seaborn": {
"library_name": "Seaborn",
"releases_url": "https://pypi.org/project/seaborn/",
"logo": "https://seaborn.pydata.org/_images/logo-wide-lightbg.svg",
},
"TensorFlow": {
"library_name": "TensorFlow",
"releases_url": "https://pypi.org/project/tensorflow/",
"logo": "https://www.tensorflow.org/images/tf_logo_social.png",
},
"Keras": {
"library_name": "Keras",
"releases_url": "https://pypi.org/project/keras/",
"logo": "https://keras.io/img/logo.png",
},
"PyTorch": {
"library_name": "PyTorch",
"releases_url": "https://pypi.org/project/torch/",
"logo": "https://pytorch.org/assets/images/pytorch-logo.png",
},
"SQLAlchemy": {
"library_name": "SQLAlchemy",
"releases_url": "https://pypi.org/project/SQLAlchemy/",
"logo": "https://www.sqlalchemy.org/img/sqla_logo.png",
},
"BeaultifulSoup": {
"library_name": "BeaultifulSoup",
"releases_url": "https://pypi.org/project/beautifulsoup4/",
"logo": "https://www.crummy.com/software/BeautifulSoup/10.1.jpg",
},
"LangChain": {
"library_name": "LangChain",
"releases_url": "https://pypi.org/project/langchain/",
"logo": "https://python.langchain.com/img/brand/wordmark-dark.png",
},
"CrewAI": {
"library_name": "CrewAI",
"releases_url": "https://pypi.org/project/crewai/",
"logo": "https://cdn.prod.website-files.com/66cf2bfc3ed15b02da0ca770/66d07240057721394308addd_Logo%20(1).svg",
},
"Flask": {
"library_name": "Flask",
"releases_url": "https://pypi.org/project/Flask/",
"logo": "https://flask.palletsprojects.com/en/stable/_static/flask-vertical.png",
},
"Pygame": {
"library_name": "Pygame",
"releases_url": "https://pypi.org/project/pygame/",
"logo": "https://www.pygame.org/images/logo_lofi.png",
},
"Thinker": {
"library_name": "Thinker",
"releases_url": "https://pypi.org/project/thinker/",
"logo": "https://keras.io/img/logo.png",
},
"Plotly": {
"library_name": "Plotly",
"releases_url": "https://pypi.org/project/plotly/",
"logo": "https://plotly.com/static/img/logos/plotly-logomark.svg",
},
"MlForecast": {
"library_name": "MlForecast",
"releases_url": "https://pypi.org/project/mlforecast/",
"logo": "https://raw.githubusercontent.com/Nixtla/mlforecast/main/nbs/figs/logo.png",
},
"GeoPandas": {
"library_name": "GeoPandas",
"releases_url": "https://pypi.org/project/geopandas/",
"logo": "https://geopandas.org/en/stable/_static/geopandas_logo_web.svg",
},
"AirFlow": {
"library_name": "AirFlow",
"releases_url": "https://pypi.org/project/apache-airflow/",
"logo": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/de/AirflowLogo.png/800px-AirflowLogo.png?20191014185111",
},
"PySpark": {
"library_name": "PySpark",
"releases_url": "https://pypi.org/project/pyspark/",
"logo": "https://spark.apache.org/docs/latest/api/python/_static/spark-logo-reverse.png",
},
"Gym": {
"library_name": "Gym",
"releases_url": "https://pypi.org/project/gym/",
"logo": "https://www.gymlibrary.dev/_static/img/gym_logo_black.svg",
},
"HyperOpt": {
"library_name": "HyperOpt",
"releases_url": "https://pypi.org/project/hyperopt/",
"logo": "https://camo.githubusercontent.com/d9cabe82cdc7bff598f84d61b0a8921cd5c3ceb0716b03399fc31db1a2a23182/68747470733a2f2f692e706f7374696d672e63632f54506d66665772702f68797065726f70742d6e65772e706e67",
},
"Streamlit": {
"library_name": "Streamlit",
"releases_url": "https://pypi.org/project/streamlit/",
"logo": "https://streamlit.io/images/brand/streamlit-mark-color.png",
},
"Crawl4ai": {
"library_name": "Crawl4ai",
"releases_url": "https://crawl4ai.com/mkdocs/blog/",
"logo": "https://star-history.com/#unclecode/crawl4ai&Date",
},
}
2 changes: 1 addition & 1 deletion cacheVariables.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
pynews = {}
pynews = {}
Loading