Skip to content

Pyodbc install erroring #1024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FabianDS2 opened this issue Mar 9, 2021 · 20 comments
Open

Pyodbc install erroring #1024

FabianDS2 opened this issue Mar 9, 2021 · 20 comments
Labels
airflow on-hold Issues or Pull Requests with this label will never be considered stale

Comments

@FabianDS2
Copy link

I am relatively new to Airflow, Docker, and Bitnami but I am having trouble getting pyodbc to be installed on the bitnami airflow containers. I want to be able to use Airflow on Azure for work projects so that's how I found out about the bitnami-docker-airflow project.

I have followed the directions from this page: https://github.com/bitnami/bitnami-docker-airflow/blob/master/README.md

I started with the curl -sSL https://raw.githubusercontent.com/bitnami/bitnami-docker-airflow/master/docker-compose.yml > docker-compose.yml and got the docker-compose.yaml file in my documents folder. I then went in to make some changes to mount a local folder with DAG files I wanted to use and mounted another folder that had a requirements.txt file in it.

The lines below in the docker-compose.yml are the ones I added for mounting. ./dags and ./packages are the folders that the DAG .py and requirement.txt files are in. My docker-compose file will be attached.

  • ./dags:/opt/bitnami/airflow/dags # added this!
  • ./packages:/bitnami/python/ # added this!

The requirements.txt has the text pyodbc===4.0.30 as the only content in the file.

I run docker-compose up to get the containers up and running.

The output shows me that pyodbc install is failing but I can't exactly figure out what the source of the error is and what could fix it. Will attach the copy and pasted output that shows the error, having a hard time interpreting it. I have tried the docker-compose without the requirements file mounting and airflow will start up and I can see my DAGs at localhost:8080 as I would expect. I want to be able to use pyodbc though in my DAGs

Please let me know if I can provide more context

docker compose errors.txt
docker-compose file.txt

@rafariossaa
Copy link
Contributor

Hi,
The issue here is that our airflow image does not include gcc (C compiler) hence this error:

airflow-scheduler_1  |   unable to execute 'gcc': No such file or directory
airflow-scheduler_1  |   error: command 'gcc' failed with exit status 1

@FabianDS2
Copy link
Author

Thank you @rafariossaa for the explanation. I am relatively new to Docker so is there a location in the Dockerfile/docker.compose.yaml that I should look at or do more research where I could add that dependency? For future planning, as I want to use the bitnami image for production workflow in Azure, is there a way to add that dependency on the Azure image? Maybe once it's set-up and deployed? Looking forward to hearing any guidance you can give, anything is helpful!

@rafariossaa
Copy link
Contributor

Hi,
You can add build-essential to the list of packages here, and build your own image.
I am not very sure about your Azure question. Once you have you own image, push it to a repository (eg. docker hub) and you will be able to use from wherever you need.

@stale
Copy link

stale bot commented Apr 14, 2021

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@stale stale bot added the stale 15 days without activity label Apr 14, 2021
@github-actions
Copy link

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

@msmerc
Copy link

msmerc commented Apr 27, 2021

We also see this issue!

The airflow documentation says to do:

pip install apache-airflow[odbc]

http://airflow.apache.org/docs/apache-airflow-providers-odbc/stable/connections/odbc.html

But this leads to the same gcc error.

We'd expect this to work out-of-the-box given that all the other airflow providers do. Is it something that could be added do you think?

@rafariossaa
Copy link
Contributor

Hi,
I am creating an internal task to evaluate and add the build tools to our container image.
We will come back as soon as we have news.

@rafariossaa rafariossaa reopened this Apr 27, 2021
@rafariossaa rafariossaa added on-hold Issues or Pull Requests with this label will never be considered stale and removed stale 15 days without activity labels Apr 27, 2021
@msmerc
Copy link

msmerc commented Apr 27, 2021

@rafariossaa - thanks, appreciated.

If you do airflow info in this image once it is running the apache-airflow-providers-odbc is conspicuously absent. I'm guessing it's all related.

@rafariossaa
Copy link
Contributor

Hi,
Yes, I guess it is as consequence of the compiling error.

@ShawnMcGough
Copy link

Bitnami has suggested here that this is by design.

This is how I solved it.
https://www.shawnmcgough.com/airflow-connect-to-sql-server-mssql/

@rafariossaa
Copy link
Contributor

Hi @ShawnMcGough
Thanks for the feedback and for creating that blog post, that will be super useful for other users.
On our side, I think it would be nice to have the compiling tools added by default to airflow image. This way users won't need to extend the image just to include them.

@FabianDS2
Copy link
Author

Thanks for the article @ShawnMcGough - that will be helpful in the meantime. Our use case is getting Airflow working on MS Azure. On Azure, to get Airflow set up properly on App Services, you have to select the Bitnami image (which are already "loaded" as in you select from which Bitnami image you want) you want it to provision a container for. I don't think you can modify the image so having it as part of the official Bitnami image will make the Azure App Services one work for our use case. We're pretty early on in our Azure (and Airflow) learning so the easier it can be, the better ;)

@FabianDS2
Copy link
Author

I see you're actually an Azure expert maybe @ShawnMcGough from your profile? We're a data science team of two people just getting to the cloud so configuring this type of stuff is super new, let alone Docker, etc. We figured the pre-provided Bitnami images were the easiest way to go

@ShawnMcGough
Copy link

@FabianDS2 I hadn't thought to deploy Airflow within an App Service. Also, there are many deploy variants that can quickly add complexity (which it sounds like you're discovering)! Which method are you using, exactly?

Unfortunately, connecting to Microsoft SQL Server is a feature not available 'out of the box' for any of the options currently available.

@FabianDS2
Copy link
Author

@ShawnMcGough up until now, I've just been working with Airflow locally to get a feel for how it works since it was brand new. I took the DataCamp class on it and built the most basic POC I could locally. Our likely goal though is to get it to a point where it could be deployed to a non-local "site" where we could log in and manage/run pipelines etc to execute the queries for our SQL queries that get us to the point of having a modeling data set to feed to ML algorithms or a training script (which could also be part of the pipeline). That's where the Azure part comes in. On Azure Marketplace, Bitnami has those Airflow containers on there where it deploys those containers to Azure App Services as far as I can tell. Since we don't have DevOps resources on our team, that seemed like the best option. I think we could get away with just having Airflow set up on the Remote Desktop that we use for data science development / model training etc. The Azure part is probably our thoughts as to how to "professionalize" it if I can make up a phrase. Eventually, it might be more than just us using Airflow (maybe some other Analytics - department we sit in - people will use it also) so that's where the App Services environment may become helpful.

But like you said, you can only choose between the Bitnami images that are already created without a lot of space for customization as far as I'm aware

@alexisaraya
Copy link

Hi, I have followed the thread, and my query is if this is still in the same state? It is still not possible to connect to MSSQL with the bitnami airflow image ?
Thank you for your clarification

@rafariossaa
Copy link
Contributor

rafariossaa commented Apr 19, 2022

You can, but right now, you need to add the connector by yourself.

@yimingpeng
Copy link

yimingpeng commented May 6, 2022

Not sure whether the thread is still valid, but thought it would be helpful for those are still struggling with getting bitnami airflow to connect with MSSQL, here is how I rebuild the image to make it work (just use scheduler as an example):

FROM bitnami/airflow-scheduler:latest
USER root
# Change default terminal to Bash
SHELL ["/bin/bash", "-c"]
RUN rm /bin/sh && ln -s /bin/bash /bin/sh
# Install linux dependencies
RUN install_packages gcc unixodbc-dev unixodbc libpq-dev g++ build-essential python3-dev 
# Install MSSQL Debian Driver
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN ACCEPT_EULA=Y install_packages msodbcsql18 mssql-tools18
RUN echo 'export PATH="$PATH:/opt/mssql-tools18/bin"' >> ~/.bashrc
RUN source ~/.bashrc
# Resolving the MSSQL driver issue
RUN chmod +rwx /etc/ssl/openssl.cnf
RUN sed -i 's/TLSv1.2/TLSv1/g' /etc/ssl/openssl.cnf
RUN sed -i 's/SECLEVEL=2/SECLEVEL=1/g' /etc/ssl/openssl.cnf
USER 1001

Hope this would help.

Something to note:

  1. I had to downgrade TLS from 1.2 to 1.0, because I was meeting the issue of "Connection Timeout Expired" due to our vendor's MSSQL server doesn't support TLS1.2. More information: SqlClient troubleshooting guide
  2. Also pls don't forget to get airflow-ODBC in your requirements.txt, i.e., apache-airflow-providers-odbc

Happy coding :).

@rafariossaa
Copy link
Contributor

Hi @yimingpeng ,
Thank you very much for providing an example.

@carrodher
Copy link
Member

We are going to transfer this issue to bitnami/containers

In order to unify the approaches followed in Bitnami containers and Bitnami charts, we are moving some issues in bitnami/bitnami-docker-<container> repositories to bitnami/containers.

Please follow bitnami/containers to keep you updated about the latest bitnami images.

More information here: https://blog.bitnami.com/2022/07/new-source-of-truth-bitnami-containers.html

@carrodher carrodher transferred this issue from another repository Jul 28, 2022
@bitnami-bot bitnami-bot added the triage Triage is needed label Jul 28, 2022
@bitnami-bot bitnami-bot removed the triage Triage is needed label Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
airflow on-hold Issues or Pull Requests with this label will never be considered stale
Projects
None yet
Development

No branches or pull requests

9 participants