You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ended up doing some preprocessing on those specifically. Not a perfect solution, but does what I needed so far...
import sys
!{sys.executable} -m pip install flatten_json pandas boto3
import pandas as pd
import json
import csv
from pandas.io.json import json_normalize
def get_attribute(data, attribute, default_value):
return data.get(attribute) or default_value
def flatten_with_keys(data, parent_object, tag_name, key_name, value_name, concat_name):
data_list = data[parent_object]
df_data = pd.DataFrame([])
for i in range(len(data_list)):
data_tags = get_attribute(data_list[i], tag_name, None)
# used if statement as to deal with if the tag_name doesn't exist.
if data_tags:
df = pd.DataFrame(data_list[i][tag_name]).T
df.columns = df.loc[key_name]
df = df.loc[value_name].to_frame(name=data_list[i][concat_name])
df_data = pd.concat([df_data, df.T], sort=True)
dic_flattened = [flatten(d,'.',root_keys_to_ignore={tag_name}) for d in data_list]
df_flattened = pd.DataFrame(dic_flattened)
df_flattened.set_index(concat_name, inplace=True)
df_flattened
result = pd.concat([df_flattened, df_data], axis=1, sort=False)
return result
Example if you're running with aws cli where you want to flatten the vpc data. It's a bit of extra things, and not sure if it's going to work all the time.
We have a situation where we most of the data is in "normal" json formats and then we have a "catch all" that is a key-value pairing.
My anticipated output would look something like the following.
Just starting to think about a way to do this by augmenting the flatten_json code, but was curious if anyone came up with a solution to this problem.
The text was updated successfully, but these errors were encountered: