-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String Length Check #14
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes to the readme that I'd want to see.
Great work on the thorough tests. An improvement opportunity I see would reduce duplication inside of many of the tests. Move commonly used things into a test environment trait or collection of them and extend that environment in the test, e.g.
//old
it("minValue") {
val dict = new VarSubstitution
//… the rest
}
//new
it("minValue") in new VarSubstutitionSetup {
//… the rest
}
trait VarSubstitutionSetup {
val dict = new VarSubstitution
}
This is a really shallow example, but there are some other duplicate things or near duplicates that you could move into this trait or other traits, e.g. VarSubstitutionSetup with YamlDecoder
and that latter trait is defined as
trait YamlDecoder {
def decodeYaml(yaml: String) = io.circe.yaml.parser.parse(yaml).right.getOrElse(Json.Null)
}
This "cake building" would let you reduce your tests in many cases to only the assertions, maybe with one additional line that calls methods provided by the test environment traits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samratmitra-0812 This is great!
The tests are pretty consistent with the existing code, but @colindean comments are good. I think we should have a separate pr to cleanup the test like what Colin suggests. I tried to add some handy test functions in #13 but something like what colin suggested might be better.
I would be fine with just making the suggested changes to the README.md
and getting this merged, and fixing the tests in a future pr after #13 is merged.
Co-Authored-By: Colin Dean <[email protected]>
Co-Authored-By: Colin Dean <[email protected]>
Co-Authored-By: Colin Dean <[email protected]>
@dougb I completely agree with you. We should really take up on colin's suggestions and do the refactoring. Let us make sure we have a JIRA ticket for this. I have commited the suggestions to README.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I might have found a problem testing this on BR, need to dig a little more. Don't merge it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samratmitra-0812 This is really nice, thank you for contributing. I don't like the ordering assumption we make on the asserts on the tests. I am curious to hear what @dougb thinks. I am OK with addressing this in the test refactoring PR.
import org.apache.spark.sql.DataFrame | ||
import org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute | ||
import org.apache.spark.sql.catalyst.expressions._ | ||
import org.apache.spark.sql.types.{DataType, IntegerType, StringType, StructType} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samratmitra-0812 DataType
is an unused import
|
||
import com.target.TestingSparkSession | ||
import com.target.data_validator._ | ||
import com.target.data_validator.validator.ValidatorBase.D0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
describe("configCheck") { | ||
|
||
it("error if min and max are not defined") { | ||
val dict = new VarSubstitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is unused
} | ||
|
||
it("error if min and max are different types") { | ||
val dict = new VarSubstitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is unused
} | ||
|
||
it("error if column is not found in df") { | ||
val dict = new VarSubstitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is unused
} | ||
|
||
it("error if col type is not String") { | ||
val dict = new VarSubstitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is unused
} | ||
|
||
it("minValue less than maxValue fails configCheck") { | ||
val dict = new VarSubstitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict
is unused
assert(sut.failed) | ||
assert(sut.getEvents contains | ||
ValidatorCheckEvent(failure = true, "StringLengthCheck on column 'item'", 4, 2)) | ||
assert(sut.getEvents contains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samratmitra-0812 @dougb I think we need to be careful here, and in other tests below.
This assert is based on the original ordering of defData
. There is no guarantee that when converted to a dataframe, partitioned, and operated on that we will get results back in the same order. Since there are two potential modes of failure here this result could very well be the other one (empty string).
@phpisciuneri Thanks for your comments. I have removed the unused imports and variables. Nice catch on the assert ! I have made modifications by checking that exactly one of the 2 errors is present in the event list. The issue does not exist for other tests because for all of them, the number of invalid rows is equal to the specified value of numErrorsToReport. So we can expect errors for all the invalid rows to be present in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clear my RC if you'll refactor the tests in a future PR 😉
EEDD-1020: String Length Check.
Unit Test results attached in https://jira.target.com/browse/EEDD-1020