You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/posts/enhancing_data_quality_amos.md
+31-1Lines changed: 31 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,36 @@ Data cleansing is a vital process in enhancing the quality of data within a data
37
37
### Missing Value Imputation
38
38
39
39
With a dataset refined to exclude unwanted data points and accounting for potential sensor failures, the next step toward ensuring high-quality data is to address any missing values through imputation. The component we developed first identifies and flags missing values by leveraging PySpark’s capabilities in windowing and UDF operations. With these techniques, we are able to dynamically determine the expected interval for each sensor by analyzing historical data patterns within defined partitions. Spline interpolation allows us to estimate missing values in time series data, seamlessly filling gaps with plausible and mathematically derived substitutes. By doing so, data scientists can not only improve the consistency of integrated datasets but also prevent errors or biases in analytics and machine learning models.
40
+
To actually show how this is realized with this new RTDIP component, let me show you a short example on how a few lines of code can enhance an exemplary time series load profile:
41
+
```python
42
+
from rtdip_sdk.pipelines.data_quality import MissingValueImputation
To illustrate this visually, plotting the before-and-after DataFrames reveals that all gaps have been successfully filled with meaningful data.
61
+
62
+
<center>
63
+
64
+
{width=70%}
65
+
66
+
{width=70%}
67
+
68
+
</center>
69
+
40
70
41
71
### Normalization
42
72
@@ -56,7 +86,7 @@ Working on the RTDIP Project within AMOS has been a fantastic journey, highlight
56
86
57
87
To look back, our regular team meetings were the key to our success. Through open communication and collaboration, we tackled challenges and kept improving our processes. This showed us the power of working together in an agile framework and growing as a dedicated SCRUM team.
58
88
59
-
We’re excited about the future and how these advancements will help data scientists and engineers make better decisions. ((Thank you for joining us on this journey.))
89
+
We’re excited about the future and how these advancements will help data scientists and engineers make better decisions.
0 commit comments