Skip to content

"Clean Up Files" Feature #1023

@natanrolnik

Description

@natanrolnik
Contributor

Make sure these boxes are checked before submitting your issue -- thanks for reporting issues back to Parse Server!

  • You've met the prerequisites.
    You're running the latest version of Parse Server.
    You've searched through existing issues. Chances are that your issue has been reported or resolved before.

One of the features that I liked on the hosted Parse was, in the settings, the button Clean Up Files. This way, every file stored in S3 for example, that wasn't anymore referenced from a PFFile, would be deleted. I liked it specially because it allowed us to save on unused/unneeded resources.

Maybe a Rest call using the master key would be initially enough? In the future, with possible integration with the parse-dashboard?

I know it's lower priority compared to the features/fixes that are being developed, but that would be great to have.

Activity

changed the title [-]Allow deleting unused files[/-] [+]"Clean Up Files"[/+] on Mar 15, 2016
changed the title [-]"Clean Up Files"[/-] [+]"Clean Up Files" Feature[/+] on Mar 15, 2016
gfosco

gfosco commented on Mar 17, 2016

@gfosco
Contributor

This would be pretty difficult actually, and would need to be built for each specific Files adapter. Right now, there's no 'listing' of what files exist through the adapter.

natario1

natario1 commented on Mar 17, 2016

@natario1

+1 , agree with the need.

ckarmy

ckarmy commented on Jul 18, 2016

@ckarmy

It's possible to clean the unused files stored in GridStore now?

yorkwang

yorkwang commented on Sep 28, 2016

@yorkwang

+1, It's a very useful feature.

Lokiitzz

Lokiitzz commented on Oct 3, 2016

@Lokiitzz

+1, It would be nice.

umair6

umair6 commented on Oct 19, 2016

@umair6

+1

abdulwasayabbasi

abdulwasayabbasi commented on Oct 19, 2016

@abdulwasayabbasi

+1 very much needed

JoseVigil

JoseVigil commented on Nov 21, 2016

@JoseVigil

+1

55 remaining items

davimacedo

davimacedo commented on Nov 11, 2020

@davimacedo
Member

I think the current script is more a proof of concept. It is not scalable and would almost certainly crash/block the DB for an unacceptable amount of time of any serious sized production system.

That's why I'd not go with the script in the api. It will be only a matter of time for people to start complaining about the script not working. The same happened with the push notifications system. It took a long time to have a scalable process because previously it was a single parse server instance trying to handle all pushes.

For this to be scalable in the api, we'd need to to a similar approach to the one in push notifications. Break the files in small sets, put those sets on a queue and run multiple processes consuming the sets and processing one by one. Even though we are talking about something that will be complex to be written and also to be deployed.

mtrezza

mtrezza commented on Nov 11, 2020

@mtrezza
Member

Good points. @dblythy can you find anything reusable in the files utils repo that has been mentioned before?

dblythy

dblythy commented on Nov 11, 2020

@dblythy
Member

I had a quick look through it and it seems to use a similar search algorithm as I wrote (lookup schema and look for “File”). I can have a more detailed look at that and also how the push notifications approach is done and work towards a cleanup feature similar to that.

c0sm1cdus7

c0sm1cdus7 commented on May 12, 2021

@c0sm1cdus7

Was this ever implemented?

dblythy

dblythy commented on May 12, 2021

@dblythy
Member

The main reason this stalled was because figuring out whether a file is an "orphan" (as in it is not associated to any parent object) is entirely dependant on the way that files are associated with objects. As a file can be set to object.field, object.field[2].key.field[2], can be associated by relations, pointers, etc, it becomes rather difficult for an inbuilt algorithm to decide whether a file is to be deleted or not.

If you're familiar with how your Parse Server determines file associations, you can do something similar to this:

const Config = require('./node_modules/parse-server/lib/Config');
const app = Config.get('appId');
const bucket = await app.database.adapter._getBucket();
const files = [];
const fileNamesIterator = await bucket.find().toArray();
fileNamesIterator.forEach(({filename}) => {
  const file = new Parse.File(filename);
  file._url = config.filesController.adapter.getFileLocation(config, filename);
  files.push(file);
});
// loop through files and check if they have any association. If not, delete.
dblythy

dblythy commented on Aug 20, 2021

@dblythy
Member

I think the main conceptual challenge here is:

  1. How to work out which files are orphans, which could be difficult with more complex schemas, such as nested files.
  2. How to make the feature backwards compatible so that references for existing files are tracked.

I'm thinking:

  1. Could let the developer write some sort of Parse.Cloud.checkFileParents which the developer is responsible for querying for the file in existing data.
  2. Could begin new file references from new version, and encourage developers to manually write queries and saves to build the "FileObject" storage.
mtrezza

mtrezza commented on Aug 20, 2021

@mtrezza
Member

Good analysis @dblythy. Could we break this down into a minimum viable feature with some limitations?

  • The first version doesn't have to cover all special cases of how to reference a file, but only the basic way of file pointers.
  • Later on we can just add an API to manually change the counter; it would be the developer's responsibility to manipulate the counter correctly, according to their use case; this also gives the most feature versatility.
  • Later on we could also cover some special reference cases (pointer arrays, or whatever).
dblythy

dblythy commented on Aug 20, 2021

@dblythy
Member

I think that’s a good point. Perhaps for most users, being able to have a collection of their files (uploaded by, view count, etc) visible in the dashboard would probably be an improvement.

We could add that the counter will only be accurate for simple data schemas, and leave the deleting of files up to the developer.

There could be the potential to bake in some common use cases, such as unique profile picture management.

mtrezza

mtrezza commented on Aug 20, 2021

@mtrezza
Member

There could be the potential to bake in some common use cases, such as unique profile picture management.

I like that very much, like with hashes to not upload the same file multiple times, but reference the existing file? I think that would be a very practical use case. Yes, we can see this has a lot of potential, so getting a very basic first feature version released would be a good start.

added
bounty:$100Bounty applies for fixing this issue (Parse Bounty Program)
on Oct 5, 2021
added
type:featureNew feature or improvement of existing feature
and removed on Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bounty:$100Bounty applies for fixing this issue (Parse Bounty Program)type:featureNew feature or improvement of existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @funkenstrahlen@hramos@flovilmart@gfosco@respectTheCode

      Issue actions

        "Clean Up Files" Feature · Issue #1023 · parse-community/parse-server