Using a ruby script only, turn the data in the last worksheet of the attached workbook in to a nested Hash where:
-
the outermost keys are the values for column D (Device_Type.name)
-
the next outermost are the corresponding values for column E (Device_Model.manufacturer_identifier)
-
the next outermost are the corresponding values for column I (Device_Model.model_identifier_concatenated_with_hardware_version)
The corresponding values at the innermost level are each a hash composed as:
{
firmware_version: the value in *column J* (Device_Model.firmware_version),
smets_chts_version: the value in *column K* (SMETS_CHTS Version.Version_number_and_effective_date),
gbcs_version: the value in *column L* (GBCS Version.version_number),
image_hash: the value in *column M* (Manufacturer_Image.hash)
}
The data from all rows in the worksheet must be included in the Hash, except those marked "Removed" in column C, which must be excluded. Add comments to explain your solution and why you believe it is optimal
The script has been updated to translate the data in a csv file (EG: "Central-Products-Lis.csv"), OR the last sheet in an xlsm workbook (EG: "Central-Products-Lilst_sheets.xlsm"), into a nested hash - provided the csv data in question contains the appropriate headers stated in the brief above.
Previously, the csv file at the end of the xlsm file had been attained manually; following prompts by the technical director, it has been ammended to use the Roo gem to handle the parse, followed by the creation of a method that standardizes that sheet into a csv file so that it can be appropriately translated.
The data_parse
method has been ammended.
Previously in the global scope,
it operated as a private method on an instance of main
Object
in a procedural style (which Ruby supports).
It has now been wrapped inside a class.
The benefits are that we now are adherring to an object oriented way of programming,
can better organise the code, accomodate extensions,
and ultimately give better context to the data_parse
method.
Lastly, if we were to have several data sources, we could easily reuse the class and its methods
- therein saving time and effort.
Were the program to have fewer moving parts, it might make sense to keep it as a standalone method (as is the case for the "bin/lib/hashify_csv.rb"). That said, in this case it makes greater sense for a class to be used.
The script turns the data in the file into a hash. To use the script run the command followed by the desired path/source file, and then followed by the destination file (i.e source.rb).
Steps:
-
git clone this repo:
git clone [repo]
-
Install dependencies:
bundle install
-
Run the specs:
rspec -fd
-
Run the script by typing in the csv or xlsm file and path as a first argument, and the destination path as a second argument:
bin/csv-to-hash Central-Products-Lis.csv destination.rb
OR:
bin/csv-to-hash Central-Products-List_sheets.xlsm destination.rb
EG: bin/csv-to-hash source.csv destination.rb
OR bin/csv-to-hash somewhere/source.csv somewhere_else/destination.rb
The testing suite used is Rspec. The process followed was BDD. Simplcov is used to check for code coverage.
The strategy that I used was based on the fact that when we try to append a hash to a nested hash, if any keys coincide then the original hash will be overwritten with the incoming one. We therefore come up with a strategy to ensure that the data is not lost and overwritten when storing new data that may already be referred to by the same key.
{
"Column D value"=>{
"Column E value"=>{
"Column I value"=>{
:firmware_version=> "Column J value",
:smets_chts_version=> "Column K value",
:gbcs_version=> "Column L value",
:image_hash=> "Column M value"
}
}
}
}
I was uncertain whether it was "valid" to use an array to store the inner hashes. Since the brief only made mention of a nested array. For the inner hashes, I therefore took an approach that checked whether the key was already in use by and if so swictched the value on the firmware key to be the key of the remaining key/value pairs of the same inner hash. This way we would not lose any of the data, and we would still have the speed of the hash table.
However, I then realised that as we got to the last value on the inner hash, we would either have to create a key for it using an index number (but we get that for free with arrays), or make it a value inside an array anyway. (I have left some examples of what I mean at the end of the file).
I felt the optimal thing to do was to place it inside an array early on, as we still dont lose any data. We can still leverage the speed of hash look up, only up to a certain point.
The benefit of this is that if at some stage we get to the last inner value, we will have to store the image_hash in an array, and it will now be truncated from it’s original "shape" anyway. So it is easier to retain the shape of the inner hash now, and if we need to perform a comparison on any of the inner hash we can do so iteratively using the array.
I therefore chose to focus on this line in the requirement: "The corresponding values at the innermost level are each a hash composed as […]" Taking this second approach the innermost hash should never be changed and always retain the key value relationship already in place.
If Column I value is already in use, then that key will become the key for an array containing the different instances of the inner hash.
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => [
{# If this is already in use
firmware_version: '1.1.1',
smets_chts_version: 'CHTS V1',
gbcs_version: 'GBCS Version 1',
image_hash: 'image1hash'
},
{ # And this one will be identical
:firmware_version => "1a.1.1",
:gbcs_version => "GBCS Version 1a",
:smets_chts_version => "CHTS V1a",
:image_hash => "image1Ahash"
}
]
}
}
}
This an example of what I had originally thought to do but thought less optimal:
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => {
'1a.1.1' => {
"GBCS Version 1" => {
:smets_chts_version => 'CHTS V1',
:image_hash => 'image1hash'
},
"GBCS Version 1a" => {
:smets_chts_version => "CHTS V1a",
:image_hash => "image1Ahash"
},
"GBCS Version 1b" => {
:smets_chts_version => 'CHTS V1b',
:image_hash => 'image1Ahash'
},
"GBCS Version 1c" => {
:smets_chts_version => 'CHTS V1b',
:image_hash => 'image1Ahash'
}
}
}
}
}
}
it 'It groups with duplicate headers/keys for "fifth inner-key" - 2 values' do
input_csv =
<<~CSV
Version,Entry.number,Entry.status,Device_Type.name,Device_Model.manufacturer_identifier,Device_Model.model_identifier,Device_Model.hardware_version.version,Device_Model.hardware_version.revision,Device_Model.model_identifier_concatenated_with_hardware_version,Device_Model.firmware_version,SMETS_CHTS Version.Version_number_and_effective_date,GBCS Version.version_number,Manufacturer_Image.hash
Version 1,1,Current,Type_Communications,Mfg_1,Model_1,1.0.0,AC,1.1.0,1a.1.1,CHTS V1,GBCS Version 1,image1hash
Version 1,1,Current,Type_Communications,Mfg_1,Model_1,1.0.0,AC,1.1.0,1a.1.1,CHTS V1a,GBCS Version 1,image1Ahash
CSV
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => {
'1a.1.1' => {
"GBCS Version 1" => {
"CHTS V1" => {
:image_hash => 'image1hash'
},
{
"CHTS V1a" => {
:image_hash => "image1Ahash"
}
}
}
}
}
}
}
}
actual = data_parse(input_csv)
expect(actual).to eq expected
end
# If the 'gbcs' version is already in use for an incoming hash, then we will have to store the image in some kind of array,
# The question is what is better to store the last value. we know that if this happens, it will only happen once, so speed at this stage is maybe not an issue.
# though they have specified a nest hash. so, that can mean,
# Thoughts on ending:
## OPTION 1
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => {
'1a.1.1' => {
"GBCS Version 1" => {
'CHTS V1' =>[{
:image_hash => 'image1hash'
},
{
:image_hash => "image2Ahash"
}]
}
}
}
}
}
}
}
## OPTION 2
# An index does this for free though
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => {
'1a.1.1' => {
"GBCS Version 1" => {
'CHTS V1' => {
0 => 'image1hash',
1 => "image2Ahash"
}
}
}
}
}
}
}
}
## OPTION 3
expected = {
'Type_Communications' => {
'Mfg_1' => {
'1.1.0' => {
'1a.1.1' => {
"GBCS Version 1" => {
'CHTS V1' =>[
'image1hash',
"image2Ahash"
]
}
}
}
}
}
}
}