Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interval Filtering not working for EventTime column of type 'datetime' #53

Closed
Timm638 opened this issue Nov 19, 2024 · 0 comments · Fixed by #72
Closed

Interval Filtering not working for EventTime column of type 'datetime' #53

Timm638 opened this issue Nov 19, 2024 · 0 comments · Fixed by #72
Assignees
Labels
bug Something isn't working

Comments

@Timm638
Copy link

Timm638 commented Nov 19, 2024

Describe the bug

The filter()-method of Interval Filtering doesn't filtering, if the EventTime-column is of the type datetime.
filter() fails at the last line, where the internal pandas dataframe is converted into a PySpark dataframe.

To Reproduce

  1. Download example data
    Example data:
    Actual Generation per Production Type_2024-2025.csv

  2. execute following code

from rtdip_sdk.pipelines.data_wranglers import NormalizationMean, NormalizationZScore, NormalizationMinMax, Denormalization, DuplicateDetection
from rtdip_sdk.pipelines.data_wranglers import KSigmaAnomalyDetection, IntervalFiltering
import pandas as pd
import matplotlib.pyplot as plt
from pyspark.sql import SparkSession

spark_session = SparkSession.builder.master("local[2]").appName("test").getOrCreate()

source_df = pd.read_csv('./Actual Generation per Production Type_2024-2025.csv')
df = source_df
df['MTU-Start'] = pd.to_datetime(df['MTU'].apply(lambda x: x.split('-')[0]), dayfirst=True)
df['Solar  - Actual Aggregated [MW]'] = pd.to_numeric(df['Solar  - Actual Aggregated [MW]'], errors='coerce')
df = df.set_index('MTU-Start')

april_2_week = pd.to_datetime('07.04.2024', dayfirst=True)
april_mid = pd.to_datetime('14.04.2024', dayfirst=True)

april_df = df[april_2_week:april_mid]
intf_april_df = april_df.copy(deep=True)
intf_april_df = intf_april_df['Solar  - Actual Aggregated [MW]'].to_frame()
intf_april_df['EventTime'] = pd.to_datetime(april_df.index)
# comment out to make it work
# intf_april_df['EventTime'] = pd.Series(intf_april_df['EventTime'], dtype="string")


intf_ps_df = spark_session.createDataFrame(intf_april_df, ['Solar  - Actual Aggregated [MW]', 'EventTime'])
output_intf_df = IntervalFiltering(spark_session, df=intf_ps_df, interval=6, interval_unit='hours', time_stamp_column_name='EventTime').filter().toPandas()
  1. Receive the error message

Expected behavior

Receiving an output from the component

Installation Setup (please complete the following information):

  • Pyspark Version: 3.5.3
  • Python Version: 3.11.8

Additional context

Add any other context about the problem here.

@Timm638 Timm638 added the bug Something isn't working label Nov 19, 2024
@Timm638 Timm638 changed the title Interval Filtering not working for datetime EventTime columns Interval Filtering not working for Eventime of type 'datetime' Nov 19, 2024
@Timm638 Timm638 changed the title Interval Filtering not working for Eventime of type 'datetime' Interval Filtering not working for EventTime column of type 'datetime' Nov 19, 2024
@Timm638 Timm638 added the draft Draft to be reviwed by a PO label Nov 19, 2024
@luccalb luccalb moved this to Product Backlog in amos2024ws01-feature-board Nov 20, 2024
@luccalb luccalb removed the draft Draft to be reviwed by a PO label Nov 20, 2024
@dh1542 dh1542 self-assigned this Nov 24, 2024
dh1542 added a commit that referenced this issue Nov 25, 2024
Signed-off-by: Dominik Hoffmann <[email protected]>
dh1542 added a commit that referenced this issue Nov 25, 2024
@dh1542 dh1542 linked a pull request Nov 26, 2024 that will close this issue
dh1542 added a commit that referenced this issue Nov 26, 2024
Signed-off-by: Dominik Hoffmann <[email protected]>
dh1542 added a commit that referenced this issue Nov 26, 2024
Timm638 added a commit that referenced this issue Nov 26, 2024
…etime-datatype-column

Hotfix/#53 inteval filtering datetime datatype column
@github-project-automation github-project-automation bot moved this from Awaiting Review to Feature Archive in amos2024ws01-feature-board Nov 26, 2024
@dh1542 dh1542 moved this from Feature Archive to Awaiting Review in amos2024ws01-feature-board Nov 27, 2024
@sanalmert sanalmert moved this from Awaiting Review to Feature Archive in amos2024ws01-feature-board Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Feature Archive
3 participants