Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 2.2 KB

Faker_Anonymize_Address_from_dataframe.md

File metadata and controls

68 lines (46 loc) · 2.2 KB



Template request | Bug report | Generate Data Product

Tags: #faker #operations #snippet #database #dataframe

Author: Florent Ravenel

Description: This notebook provides a way to anonymize address data from a dataframe using the Faker library.

Input

Import libraries

try:
    from faker import Faker
except:
    !pip install faker
    from faker import Faker
import pandas as pd

Setup Data

data = [
    {"Name": "Mike", "Address": "x", "Score": 12},
    {"Name": "Peter", "Address": "z", "Score": 10},
    {"Name": "Lisa", "Address": "z", "Score": 11},
]
df = pd.DataFrame(data)
df

Setup Faker

Use faker.Faker() to create and initialize a faker generator, which can generate data by accessing properties named after the type of data you want.

faker = Faker()

# Column to be anonymize
col_name = "Address"

Model

Fake address

Through use of the .unique property on the generator, you can guarantee that any generated values are unique for this specific instance.

def fake_address(df, col_name):
    dict_names = {name: faker.unique.address() for name in df[col_name].unique()}
    df["New Address"] = df[col_name].map(dict_names)
    return df

Output

Display result

df_fake_address = fake_address(df, col_name)
df_fake_address