Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key Value Pairs in JSON #45

Open
montge opened this issue May 6, 2019 · 1 comment
Open

Key Value Pairs in JSON #45

montge opened this issue May 6, 2019 · 1 comment

Comments

@montge
Copy link

montge commented May 6, 2019

We have a situation where we most of the data is in "normal" json formats and then we have a "catch all" that is a key-value pairing.

{
"objects":[
          {
          "objectId": "one",
          "Tags": [
                {
                    "Key": "key1",
                    "Value": "value1"
                },
                {
                    "Key": "key2",
                    "Value": "valu2"
                },
                {
                    "Key": "key3",
                    "Value": "value3"
                }
            ]
          },
          {
          "objectId": "two",
          "Tags": [
                          {
                              "Key": "key1",
                              "Value": "value4"
                          },
                          {
                              "Key": "key3",
                              "Value": "value5"
                          },
                          {
                              "Key": "key4",
                              "Value": "value6"
                          }
                  ]
              }
          ]
}

My anticipated output would look something like the following.

ObjectId     Key1     Key2       Key3        Key4
one             value1   value2    value3     NaN
two             value4   NaN       value5     value6

Just starting to think about a way to do this by augmenting the flatten_json code, but was curious if anyone came up with a solution to this problem.

@montge
Copy link
Author

montge commented May 7, 2019

Ended up doing some preprocessing on those specifically. Not a perfect solution, but does what I needed so far...

import sys
!{sys.executable} -m pip install flatten_json pandas boto3

import pandas as pd
import json
import csv
from pandas.io.json import json_normalize

def get_attribute(data, attribute, default_value):
    return data.get(attribute) or default_value

def flatten_with_keys(data, parent_object, tag_name, key_name, value_name, concat_name):
    data_list = data[parent_object]
    df_data = pd.DataFrame([])
    for i in range(len(data_list)):
        data_tags = get_attribute(data_list[i], tag_name, None)
        # used if statement as to deal with if the tag_name doesn't exist.
        if data_tags:
            df = pd.DataFrame(data_list[i][tag_name]).T
            df.columns = df.loc[key_name]
            df = df.loc[value_name].to_frame(name=data_list[i][concat_name])
            df_data = pd.concat([df_data, df.T], sort=True)
        
    dic_flattened = [flatten(d,'.',root_keys_to_ignore={tag_name}) for d in data_list]
    df_flattened = pd.DataFrame(dic_flattened)
    df_flattened.set_index(concat_name, inplace=True)
    df_flattened
    result = pd.concat([df_flattened, df_data], axis=1, sort=False)
    return result

Example if you're running with aws cli where you want to flatten the vpc data. It's a bit of extra things, and not sure if it's going to work all the time.

import sys
!{sys.executable} -m pip install boto3

import boto3
ec2 = boto3.client('ec2')
vpc = ec2.describe_vpcs()
result_flattened = flatten_with_keys(vpc, 'Vpcs', 'Tags', 'Key', 'Value', 'VpcId')
result_flattened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant