Skip to content

BenjaminNeustadt/Nested_hash

Repository files navigation

Nested Hash

1. Brief

Using a ruby script only, turn the data in the last worksheet of the attached workbook in to a nested Hash where:

  • the outermost keys are the values for column D (Device_Type.name)

  • the next outermost are the corresponding values for column E (Device_Model.manufacturer_identifier)

  • the next outermost are the corresponding values for column I (Device_Model.model_identifier_concatenated_with_hardware_version)

The corresponding values at the innermost level are each a hash composed as:

{

  firmware_version: the value in *column J* (Device_Model.firmware_version),
  smets_chts_version: the value in *column K* (SMETS_CHTS Version.Version_number_and_effective_date),
  gbcs_version:  the value in *column L* (GBCS Version.version_number),
  image_hash: the value in *column M* (Manufacturer_Image.hash)

}

The data from all rows in the worksheet must be included in the Hash, except those marked "Removed" in column C, which must be excluded. Add comments to explain your solution and why you believe it is optimal

2. Update

The script has been updated to translate the data in a csv file (EG: "Central-Products-Lis.csv"), OR the last sheet in an xlsm workbook (EG: "Central-Products-Lilst_sheets.xlsm"), into a nested hash - provided the csv data in question contains the appropriate headers stated in the brief above.

Previously, the csv file at the end of the xlsm file had been attained manually; following prompts by the technical director, it has been ammended to use the Roo gem to handle the parse, followed by the creation of a method that standardizes that sheet into a csv file so that it can be appropriately translated.

The data_parse method has been ammended. Previously in the global scope, it operated as a private method on an instance of main Object in a procedural style (which Ruby supports).

It has now been wrapped inside a class. The benefits are that we now are adherring to an object oriented way of programming, can better organise the code, accomodate extensions, and ultimately give better context to the data_parse method. Lastly, if we were to have several data sources, we could easily reuse the class and its methods - therein saving time and effort.

Were the program to have fewer moving parts, it might make sense to keep it as a standalone method (as is the case for the "bin/lib/hashify_csv.rb"). That said, in this case it makes greater sense for a class to be used.

3. Usage

The script turns the data in the file into a hash. To use the script run the command followed by the desired path/source file, and then followed by the destination file (i.e source.rb).

Steps:

  • git clone this repo:

git clone [repo]
  • Install dependencies:

bundle install
  • Run the specs:

rspec -fd
  • Run the script by typing in the csv or xlsm file and path as a first argument, and the destination path as a second argument:

bin/csv-to-hash Central-Products-Lis.csv destination.rb

OR:

bin/csv-to-hash Central-Products-List_sheets.xlsm destination.rb
Other csv files that have the same Headers content structure as the one provided can be run like this:

EG: bin/csv-to-hash source.csv destination.rb OR bin/csv-to-hash somewhere/source.csv somewhere_else/destination.rb

4. Testing

The testing suite used is Rspec. The process followed was BDD. Simplcov is used to check for code coverage.

5. Approach

The strategy that I used was based on the fact that when we try to append a hash to a nested hash, if any keys coincide then the original hash will be overwritten with the incoming one. We therefore come up with a strategy to ensure that the data is not lost and overwritten when storing new data that may already be referred to by the same key.

5.1. Notes

My preliminary understanding was that it should look something like this
  {
    "Column D value"=>{
      "Column E value"=>{
        "Column I value"=>{
          :firmware_version=>    "Column J value",
          :smets_chts_version=>  "Column K value",
          :gbcs_version=>        "Column L value",
          :image_hash=>          "Column M value"
        }
      }
    }
  }

I was uncertain whether it was "valid" to use an array to store the inner hashes. Since the brief only made mention of a nested array. For the inner hashes, I therefore took an approach that checked whether the key was already in use by and if so swictched the value on the firmware key to be the key of the remaining key/value pairs of the same inner hash. This way we would not lose any of the data, and we would still have the speed of the hash table.

However, I then realised that as we got to the last value on the inner hash, we would either have to create a key for it using an index number (but we get that for free with arrays), or make it a value inside an array anyway. (I have left some examples of what I mean at the end of the file).

I felt the optimal thing to do was to place it inside an array early on, as we still dont lose any data. We can still leverage the speed of hash look up, only up to a certain point.

The benefit of this is that if at some stage we get to the last inner value, we will have to store the image_hash in an array, and it will now be truncated from it’s original "shape" anyway. So it is easier to retain the shape of the inner hash now, and if we need to perform a comparison on any of the inner hash we can do so iteratively using the array.

I therefore chose to focus on this line in the requirement: "The corresponding values at the innermost level are each a hash composed as […​]" Taking this second approach the innermost hash should never be changed and always retain the key value relationship already in place.

If Column I value is already in use, then that key will become the key for an array containing the different instances of the inner hash.

Like this
expected = {
  'Type_Communications' => {
    'Mfg_1' => {
      '1.1.0' =>  [
        {# If this is already in use
          firmware_version: '1.1.1',
          smets_chts_version: 'CHTS V1',
          gbcs_version: 'GBCS Version 1',
          image_hash: 'image1hash'
        },
        {  # And this one will be identical
          :firmware_version => "1a.1.1",
          :gbcs_version => "GBCS Version 1a",
          :smets_chts_version => "CHTS V1a",
          :image_hash => "image1Ahash"
        }
      ]
    }
  }
}

This an example of what I had originally thought to do but thought less optimal:

Example of what we can do if we make the value the key for first inner-most hash
    expected = {
     'Type_Communications' => {
       'Mfg_1' => {
         '1.1.0' => {
           '1a.1.1' => {
             "GBCS Version 1" => {
               :smets_chts_version => 'CHTS V1',
               :image_hash => 'image1hash'
             },
             "GBCS Version 1a" => {
               :smets_chts_version => "CHTS V1a",
               :image_hash => "image1Ahash"
             },
             "GBCS Version 1b" => {
               :smets_chts_version => 'CHTS V1b',
               :image_hash => 'image1Ahash'
             },
             "GBCS Version 1c" => {
              :smets_chts_version => 'CHTS V1b',
              :image_hash => 'image1Ahash'
            }
           }
         }
       }
     }
   }
I noticed it would need the array anyway when I got to this test:
  it 'It groups with duplicate headers/keys for "fifth inner-key" - 2 values' do

    input_csv =
    <<~CSV
      Version,Entry.number,Entry.status,Device_Type.name,Device_Model.manufacturer_identifier,Device_Model.model_identifier,Device_Model.hardware_version.version,Device_Model.hardware_version.revision,Device_Model.model_identifier_concatenated_with_hardware_version,Device_Model.firmware_version,SMETS_CHTS Version.Version_number_and_effective_date,GBCS Version.version_number,Manufacturer_Image.hash
      Version 1,1,Current,Type_Communications,Mfg_1,Model_1,1.0.0,AC,1.1.0,1a.1.1,CHTS V1,GBCS Version 1,image1hash
      Version 1,1,Current,Type_Communications,Mfg_1,Model_1,1.0.0,AC,1.1.0,1a.1.1,CHTS V1a,GBCS Version 1,image1Ahash
    CSV

    expected = {
     'Type_Communications' => {
       'Mfg_1' => {
         '1.1.0' => {
           '1a.1.1' => {
             "GBCS Version 1" => {
              "CHTS V1" => {
               :image_hash => 'image1hash'
              },
             {
              "CHTS V1a" => {
               :image_hash => "image1Ahash"
              }
            }
          }
         }
       }
     }
     }
    }

    actual = data_parse(input_csv)
    expect(actual).to eq expected
  end

# If the 'gbcs' version is already in use for an incoming hash, then we will have to store the image in some kind of array,
Thoughts on the end:
# The question is what is better to store the last value. we know that if this happens, it will only happen once, so speed at this stage is maybe not an issue.
# though they have specified a nest hash. so, that can mean,
# Thoughts on ending:

## OPTION 1

    expected = {
     'Type_Communications' => {
       'Mfg_1' => {
         '1.1.0' => {
           '1a.1.1' => {
             "GBCS Version 1" => {
              'CHTS V1' =>[{
               :image_hash => 'image1hash'
              },
              {
               :image_hash => "image2Ahash"
             }]
              }
             }
           }
         }
       }
     }
   }

## OPTION 2
# An index does this for free though
    expected = {
     'Type_Communications' => {
       'Mfg_1' => {
         '1.1.0' => {
           '1a.1.1' => {
             "GBCS Version 1" => {
              'CHTS V1' => {
                0 => 'image1hash',
                1 => "image2Ahash"
              }
              }
             }
           }
         }
       }
     }
   }

## OPTION 3

    expected = {
     'Type_Communications' => {
       'Mfg_1' => {
         '1.1.0' => {
           '1a.1.1' => {
             "GBCS Version 1" => {
              'CHTS V1' =>[
              'image1hash',
              "image2Ahash"
             ]
              }
             }
           }
         }
       }
     }
   }

About

nested hash assignement

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages