Template request | Bug report | Generate Data Product
Tags: #faker #operations #snippet #database #dataframe
Author: Florent Ravenel
Description: This notebook provides a way to anonymize address data from a dataframe using the Faker library.
try:
from faker import Faker
except:
!pip install faker
from faker import Faker
import pandas as pd
data = [
{"Name": "Mike", "Address": "x", "Score": 12},
{"Name": "Peter", "Address": "z", "Score": 10},
{"Name": "Lisa", "Address": "z", "Score": 11},
]
df = pd.DataFrame(data)
df
Use faker.Faker() to create and initialize a faker generator, which can generate data by accessing properties named after the type of data you want.
faker = Faker()
# Column to be anonymize
col_name = "Address"
Through use of the .unique
property on the generator, you can guarantee that any generated values are unique for this specific instance.
def fake_address(df, col_name):
dict_names = {name: faker.unique.address() for name in df[col_name].unique()}
df["New Address"] = df[col_name].map(dict_names)
return df
df_fake_address = fake_address(df, col_name)
df_fake_address