Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymize member creation for activities if listed in erasure requests #2711

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
alter table "requestedForErasureMemberIdentities" drop constraint "unique_anonymized_member";

alter table "requestedForErasureMemberIdentities" add constraint "unique_anonymized_member" unique ("memberId", "platform", "type", "value");
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
alter table "requestedForErasureMemberIdentities" add column "memberId" uuid;
318 changes: 318 additions & 0 deletions services/apps/data_sink_worker/src/bin/anonymize-member.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
import fs from 'fs'
import path from 'path'

import {
PriorityLevelContextRepository,
QueuePriorityContextLoader,
SearchSyncWorkerEmitter,
} from '@crowd/common_services'
import { DbStore, getDbConnection } from '@crowd/data-access-layer/src/database'
import { anonymizeUsername } from '@crowd/data-access-layer/src/gdpr'
import { getServiceChildLogger } from '@crowd/logging'
import { QueueFactory } from '@crowd/queue'
import { getRedisClient } from '@crowd/redis'
import { IMemberIdentity, MemberIdentityType } from '@crowd/types'

import { DB_CONFIG, QUEUE_CONFIG, REDIS_CONFIG } from '../conf'

const log = getServiceChildLogger('anonymize-member')

const processArguments = process.argv.slice(2)

if (processArguments.length === 0 || processArguments.length % 2 !== 0) {
log.error(
`
Expected argument in pairs which can be any of the following:
- ids "<memberId1>, <memberId2>, ..."
- email [email protected]
- name "John Doe"
- <platform> <value> (e.g. lfid someusername)
`,
)
process.exit(1)
}

setImmediate(async () => {
const manualCheckFile = `manual_check_member_ids.txt`
const dbConnection = await getDbConnection(DB_CONFIG())
const store = new DbStore(log, dbConnection)
const queueClient = QueueFactory.createQueueService(QUEUE_CONFIG())
const redisClient = await getRedisClient(REDIS_CONFIG())
const priorityLevelRepo = new PriorityLevelContextRepository(new DbStore(log, dbConnection), log)
const loader: QueuePriorityContextLoader = (tenantId: string) =>
priorityLevelRepo.loadPriorityLevelContext(tenantId)

const searchSyncWorkerEmitter = new SearchSyncWorkerEmitter(queueClient, redisClient, loader, log)
await searchSyncWorkerEmitter.init()

const pairs = []
for (let i = 0; i < processArguments.length; i += 2) {
pairs.push({
type: processArguments[i],
value: processArguments[i + 1],
})
}

log.info(
`Anonymizing member based on input data: [${pairs
.map((p) => `${p.type} "${p.value}"`)
.join(', ')}]`,
)

const idParams = pairs.filter((p) => p.type === 'ids')
const idsToAnonymize: string[] = []
for (const param of idParams) {
idsToAnonymize.push(...param.value.split(',').map((id) => id.trim()))
}

const memberDataMap: Map<string, any> = new Map()

Check warning on line 68 in services/apps/data_sink_worker/src/bin/anonymize-member.ts

View workflow job for this annotation

GitHub Actions / lint-format-services

Unexpected any. Specify a different type

if (idsToAnonymize.length > 0) {
for (const memberId of idsToAnonymize) {
try {
await store.transactionally(async (t) => {

Check warning on line 73 in services/apps/data_sink_worker/src/bin/anonymize-member.ts

View workflow job for this annotation

GitHub Actions / lint-format-services

't' is defined but never used
let memberData: any

Check warning on line 74 in services/apps/data_sink_worker/src/bin/anonymize-member.ts

View workflow job for this annotation

GitHub Actions / lint-format-services

Unexpected any. Specify a different type
if (memberDataMap.has(memberId)) {
memberData = memberDataMap.get(memberId)
} else {
memberData = await store
.connection()
.one(`select * from members where id = $(memberId)`, {
memberId,
})
memberDataMap.set(memberId, memberData)
}

// Get all identities for the member
const identities = await store
.connection()
.any(`select * from "memberIdentities" where "memberId" = $(memberId)`, { memberId })

log.info({ tenantId: memberData.tenantId }, 'ANONYMIZING MEMBER DATA...')

// Anonymize each identity and update the database
for (const identity of identities) {
const hashedUsername = anonymizeUsername(
identity.value,
identity.platform,
identity.type,
)

await anonymizeMemberInDb(store, identity, hashedUsername)
}

await searchSyncWorkerEmitter.triggerMemberSync(memberData.tenantId, memberId, true)
})
} catch (err) {
log.error(err, { memberId }, 'Failed to anonymize member!')
}
}
} else {
const nameIdentity = pairs.find((p) => p.type === 'name')
const otherIdentities = pairs.filter((p) => p.type !== 'name')

if (otherIdentities.length > 0) {
const conditions: string[] = []
const params: any = {}

Check warning on line 116 in services/apps/data_sink_worker/src/bin/anonymize-member.ts

View workflow job for this annotation

GitHub Actions / lint-format-services

Unexpected any. Specify a different type
let index = 0
for (const pair of otherIdentities) {
params[`value_${index}`] = pair.value
if (pair.type === 'email') {
conditions.push(
`(type = '${MemberIdentityType.EMAIL}' and lower(value) = lower($(value_${index})))`,
)
} else {
params[`platform_${index}`] = (pair.type as string).toLowerCase()
conditions.push(
`(platform = $(platform_${index}) and lower(value) = lower($(value_${index})))`,
)
}

index++
}

const query = `select * from "memberIdentities" where ${conditions.join(' or ')}`
const existingIdentities = await store.connection().any(query, params)

if (existingIdentities.length > 0) {
log.info(`Found ${existingIdentities.length} existing identities to anonymize.`)

for (const identity of existingIdentities) {
try {
await store.transactionally(async (t) => {

Check warning on line 142 in services/apps/data_sink_worker/src/bin/anonymize-member.ts

View workflow job for this annotation

GitHub Actions / lint-format-services

't' is defined but never used
const hashedUsername = anonymizeUsername(
identity.value,
identity.platform,
identity.type,
)

// Update memberIdentities table
await store.connection().none(
`update "memberIdentities"
set value = $(hashedValue)
where "memberId" = $(memberId)
and platform = $(platform)
and type = $(type)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
platform: identity.platform,
type: identity.type,
},
)

// Add to requestedForErasureMemberIdentities
await store.connection().none(
`insert into "requestedForErasureMemberIdentities"
(id, platform, type, value, "memberId")
values ($(id), $(platform), $(type), $(value), $(memberId))
on conflict do nothing`,
{
memberId: identity.memberId,
platform: identity.platform,
type: identity.type,
value: hashedUsername,
},
)

// Update activities
await store.connection().none(
`update activities
set username = $(hashedValue)
where "memberId" = $(memberId)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
},
)

await store.connection().none(
`update activities
set "objectMemberUsername" = $(hashedValue)
where "objectMemberId" = $(memberId)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
},
)

await searchSyncWorkerEmitter.triggerMemberSync(
identity.tenantId,
identity.memberId,
true,
)
})
} catch (err) {
log.error(err, { identity }, 'Failed to anonymize member identity!')
}
}
}
}

if (nameIdentity) {
const results = await store
.connection()
.any(`select id from members where lower("displayName") = lower($(name))`, {
name: nameIdentity.value.trim(),
})

if (results.length > 0) {
addLinesToFile(manualCheckFile, [
`name: ${nameIdentity.value}, member ids: [${results.map((r) => r.id).join(', ')}]`,
])
log.warn(
`Found ${results.length} members with name: ${
nameIdentity.value
}! Manual check required for member ids: [${results.map((r) => r.id).join(', ')}]!`,
)
}
}
}

process.exit(0)
})

function addLinesToFile(filePath: string, lines: string[]) {
try {
fs.mkdirSync(path.dirname(filePath), { recursive: true })
try {
fs.accessSync(filePath)
fs.appendFileSync(filePath, lines.join('\n') + '\n')
} catch (error) {
fs.writeFileSync(filePath, lines.join('\n') + '\n')
}
} catch (err) {
log.error(err, { filePath }, 'Error while writing to file!')
throw err
}
}
Comment on lines +235 to +248
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add path validation in addLinesToFile

The function should validate the file path to prevent directory traversal attacks.

 function addLinesToFile(filePath: string, lines: string[]) {
   try {
+    // Validate file path
+    const normalizedPath = path.normalize(filePath)
+    if (normalizedPath.includes('..')) {
+      throw new Error('Directory traversal detected in file path')
+    }
+
     fs.mkdirSync(path.dirname(filePath), { recursive: true })
     try {
       fs.accessSync(filePath)

Committable suggestion skipped: line range outside the PR's diff.


async function anonymizeMemberInDb(
store: DbStore,
identity: IMemberIdentity,
hashedUsername: string,
) {
// Update member details
// todo: cleanup original member data in members table
await store.connection().none(
`update members
set "displayName" = $(hashedValue)
where id = $(memberId)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
},
)

// Update memberIdentities table
await store.connection().none(
`update "memberIdentities"
set value = $(hashedValue)
where "memberId" = $(memberId)
and platform = $(platform)
and type = $(type)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
platform: identity.platform,
type: identity.type,
},
)

// Add to requestedForErasureMemberIdentities
await store.connection().none(
`insert into "requestedForErasureMemberIdentities"
(id, platform, type, value, "memberId")
values ($(id), $(platform), $(type), $(value), $(memberId))
on conflict do nothing`,
{
id: identity.memberId,
platform: identity.platform,
type: identity.type,
value: hashedUsername,
memberId: identity.memberId,
},
)

// Update activities table
await store.connection().none(
`update activities
set "objectMemberUsername" = $(hashedValue)
where "objectMemberId" = $(memberId)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
},
)

// Update activities table for member activities
await store.connection().none(
`update activities
set username = $(hashedValue)
where "memberId" = $(memberId)`,
{
hashedValue: hashedUsername,
memberId: identity.memberId,
},
)
}
Loading
Loading