Skip to Main Content
HCL Connections Ideas Portal

Welcome to the HCL Connections Product Ideas Lab! The place where you can submit product ideas and enhancement request. We encourage you to participate by voting on, commenting on, and creating new ideas. All new ideas will be evaluated by the HCL Product Management & Engineering teams, and the next steps will be communicated. While not all submitted ideas will be executed upon, community feedback will play a key role in influencing which ideas are and when they will be implemented.

For more information and upcoming events, please visit our HCL Connections page.

Status Under Consideration
Categories 03. Files
Created by Guest
Created on Aug 9, 2018

Limit the Number of File Versions to Retain

Files that are worked on regularly often have several hundred versions that are no longer needed. However, the normal user will not delete the old versions manually; instead, they unnecessarily use up storage space. It would be great if you could limit the number of versions to be saved to 100, for example, and then delete old versions automatically.

  • Attach files
  • Guest
    Reply
    |
    Feb 3, 2023

    Here you can see an example of the "info" tab from such a file I mentioned in the last comment:

    • The file was created 2018 and multiple persons are actively working on it (last change was yesterday)

    • Current version is 936 (so nearly 1k versions yet)

    • 63,4 MB is the currently latest version file size

    • All versions require 49,6 GB of space

  • Guest
    Reply
    |
    Feb 3, 2023

    Hello,

    I still can't find any solution for this, which surprises me, since this means that at least long-living, frequently used files will create a lot of trashed data in CNX. This may be negligible for smaller files, but we all know how tools like MS office are misused and the fact, that those files are handled as binary.

    Examples of the problem

    For example, our users have an extremely large excel file of about 64 MB, which is frequently used and updated since 2017. So there are nearly 1000 versions of this single file. In total, all those versions requires about 50 GB of storage.

    This is just a simple example. We also have MS PowerPoint presentations where this is even worse, since large media files like (high resolution) images and videos are included. As a result, those files may have hundreds of MB in each version. With a bunch of files creating XX GB of versions, this adds up to hundreds of GB or even some TB (depending of the size and user activity), where at least a significant part of those data is not really used/required.

    How MS offie files are handled

    It's also noticeable that those files were handled as binary, which is part of the problem: MS office files are compressed archives containing xml files and the embedded objects. If CNX would extract those files and compare the content with the previous version, a lot of wasted space could be saved, since this delta only contains changed items (like some text) without saving a copy of e.g. images or other things, which were already on the server in the previous version.

    But this is another topic, it's just important to understand, since other tools like git have efficient deduplication, so nobody would expect that one larger commit would blow up all further ones.

    I see MS office files as some common, but special case, since you can't resolve any file problems like that. For example, video files which are directly saved, can't be saved more efficient this way. So we need usefull limits for the file versions anyway, even when CNX would handle those office files more smarter.

    Possible solutions for CNX

    Automatic

    Imho, the main point is to have some kind of limits, which can be adjusted (I'd prefer files-config.xml). A cron job checks those limits regularly. If a file reaches them, older versions are deleted to fit in the policy.

    It's not so easy to define such a policy. I don't want to forcibly delete things. Having access to file versions is very usefull, those limits shouldn't kill data which someone needs. It only should delete very old/redundant data, which for sure is not needed any more.

    I'd define this as conditions, like maximum age, maximum amout of versions and maximum space used per file. This should avoid to unnecessary delete small files with many versions, where huge files may only have a few versions because of their size. If those conditions are cofigureable, everyone can choose settings who fits best for their environment.

    It would also be usefull to specify a minimum amout of versions to keep. This is important for all cases (also manual), because some files are archived, so nobody works on them (no writes) any more. If we just specify to delete files older than e.g. a year, those files won't have any versions after that time. This may be not wanted by the users, if they need to track changes after those fime.

    Manual

    Maybe having an automated solution is not suiteable for everyone, since they need to archive things or something like that. Additionally, it should be possible to clean up large files with many versions manually using tools like wsadmin.

    For example, in our excel file with nearly 1k versions, we can delete all versions older than 01.01.2022. Or even better: Combining them, like it's done on backups, so that e.g. one version per week is kept after a certain date. So there are still older versions, if some users need to look at them.

    More examples:

    • Delete versions older than timestamp X

    • Delete versions older than timesetamp X. For older files, keep Y per week

    • Delete all old versions, but keep at least X versions

    • Delete version 300 to 600

    • Delete old versions to meet a maximum amout of space for all versions (e.g. we have 50 GB of versions, now old versions are deleted to reduce this to 30 GB)

    Getting an overview

    As an administrator, it's important to see how big this issue is in my environment. So we should have a dashboard, which lists files by their total space (all versions), the amout of versions count, date of the first + last version. In the best case, we can delete them directly there by conditions. But at least it should create an overview.

    Currently, this is possible by reverse-engineering the database and write some SQL queries. Especially on the FILES.MEDIA and FILES.MEDIA_REVISION tables. This shouldn't be required for basic features.

    End-users

    Additionally, it would be even better when end-users can control their versions on specific files. This is not a thing for everyone, but for those persons working on large files with many versions.

    Currently, we can't really navigate to specific versions easily: The web UI shows the latest 10 versions with a "load more" button. To delete the oldest, the user has to click this link 100 times (!) to just view the oldest version entry, in my 1k versions example. Then he had to delete every single version.

    This is not practical. Better would be to have pagination here, so we can easily navigate to the latest page. And if the user want to do batch deletion, he needs tools like described above to delete e.g. older versions after a specific date. Or delete version 300 to 600.

  • Guest
    Reply
    |
    Jan 30, 2020

    I'm agree with that.

    I think it could be a general parameter for each company or by user or by community.