You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I have a python library that provides interface to add some dataset metadata to the database, and I'd like to rewrite it as Airflow DAG to call ETL proces from the nice Airflow web interface instead of using a python library in a console. The thing about this problem is that each dataset gets its own implementation of the data adding interface, i.e. the amount of code grows constantly, and each dataset implementation is used in prod db once.
I'd like to keep the logic in a separate package from dags and not to rewrite it too much - in the end, I want to call the interface methods in the DAG tasks and pass the name of the implementation class to use in the DAG parameters.
Another thing I would like is to do is to use git versioning for the ETL interface library - so a dev could do git pull branch, pass a name of the newest implementation he just make and trigger the DAG to test how it works (also so would be easier to keep track if we need to extend that interface with new entities in the future).
So, basically, this is like GoF Strategy pattern, except I also want to fit it inside of Airflow DAG for user's convenience.
How to do this? I'm aware git-sync exists, but that seem to work only for DAGs themselves, also it automatically pulls changes from the repo, but I would like the user to trigger it manually, and from the branch he chooses to.
Another problem: let's suppose we put our ETL code in a DAG directory somewhere. All the imports inside the package are relative to the root of the package, and relative imports cause Airflow trouble! That's even bigger issue.
I spent a long time thinking about this, and now I'd like to hear some fresh opinions. How would you go about this task? How to make strapping new implementations onto the DAG and selecting them as DAG input parameters user-friendly and trackable at the same time?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello!
I have a python library that provides interface to add some dataset metadata to the database, and I'd like to rewrite it as Airflow DAG to call ETL proces from the nice Airflow web interface instead of using a python library in a console. The thing about this problem is that each dataset gets its own implementation of the data adding interface, i.e. the amount of code grows constantly, and each dataset implementation is used in prod db once.
I'd like to keep the logic in a separate package from dags and not to rewrite it too much - in the end, I want to call the interface methods in the DAG tasks and pass the name of the implementation class to use in the DAG parameters.
Another thing I would like is to do is to use git versioning for the ETL interface library - so a dev could do
git pull branch
, pass a name of the newest implementation he just make and trigger the DAG to test how it works (also so would be easier to keep track if we need to extend that interface with new entities in the future).So, basically, this is like GoF Strategy pattern, except I also want to fit it inside of Airflow DAG for user's convenience.
How to do this? I'm aware git-sync exists, but that seem to work only for DAGs themselves, also it automatically pulls changes from the repo, but I would like the user to trigger it manually, and from the branch he chooses to.
Another problem: let's suppose we put our ETL code in a DAG directory somewhere. All the imports inside the package are relative to the root of the package, and relative imports cause Airflow trouble! That's even bigger issue.
I spent a long time thinking about this, and now I'd like to hear some fresh opinions. How would you go about this task? How to make strapping new implementations onto the DAG and selecting them as DAG input parameters user-friendly and trackable at the same time?
Beta Was this translation helpful? Give feedback.
All reactions