Alfresco: storage volume estimation

Someone asked me yesterday how much storage space should he provide for an Alfresco installation based on raw document volume.

I’m working in my installations proposals with following estimation (What to think about when you’re planning to install Alfresco):

  • 2x content
  • 0,2x search indexes

However, I’ve never checked these figures after installation. So, let’s take a simple running installation made by using default Alfresco installer as sample.

Raw data

  • 29 GB
  • 25.377 documents
  • No custom types or aspects (this will impact database size)

Software directory

$ sudo du /opt/alfresco --max-depth=1 -h
3.4G    ./tomcat
205M    ./java
103M    ./common
603M    ./libreoffice
43G     ./alf_data
47G     .
  • About 1 GB is dedicated to base software (Java, GhostScript, ImageMagick and LibreOffice) but this space will not grow when using Alfresco
  • Tomcat directory requires 3,4 GB
  • Data directory (including files, database storage and SOLR indexes) is 43 GB

Tomcat directory

$ sudo du /opt/alfresco/tomcat --max-depth=1 -h
1.1G    ./logs
1.3G    ./webapps
1006M   ./temp
3.4G    .
  • Looking inside Tomcat we can see how logs folder is above 1 GB. If no measure is applied this directory will grow without limit
  • Web apps folder is also above 1 GB, and it will grow as every AMP deployment is saving by default a backup for alfresco.war (143 MB) and share.war (60 MB)
  • Temp folder is also 1 GB, but Alfresco is controlling contents inside by executing a cron process every night (4.00 AM by default)

Following script, based on logrotate, will be enough to control logs production, limiting history to just seven days.

$ cat /etc/logrotate.d/alfresco
/opt/alfresco/tomcat/log/catalina.out {
  copytruncate
  daily
  rotate 7
  compress
  missingok
  dateext
}

Any other files form that log directory (*.log and so on) can be included also in this logrotate policy.

Besides, temp folder can be purged manually when Alfresco is down and web apps backups (alfresco.war-111111111111.bak and share.war-111111111111.bak) can be moved away from Tomcat directory.

Data directory

$ sudo du /opt/alf_data --max-depth=1 -h
2.5G    ./solr4Backup
482M    ./postgresql
32G     ./contentstore
1.3G    ./solr4
6.3G    ./contentstore.deleted
  • SOLR backup performed by Alfresco is 2,5 GB. This feature can be disabled to save this space or backup directory can be located in another filesystem
  • PostgreSQL data is 0,5 GB. The more metadata has every document, the bigger will be this figure
  • Content Store is 32 GB, which compared to original 29 GB shows that no many versions per document are used. The more versions are stored, the bigger will be this figure
  • SOLR4 is 1,3 GB, which is less than expected (0,2 x 29 GB = 5,8 GB) but seems fine because of low use of metadata in this installation
  • Content Store Deleted is 6,3 GB and depends directly on how many removing operations are performed in the system

Comparing theory with practice

Following our own rules, expected volumes should be:

  • 2 x 29 GB = 58 GB for data storage
  • 0,2 x 29 GB = 5,8 GB for SOLR indexes

In our sample system, we have found following:

  • 32 GB + 6,3 GB = 38,3 GB for data storage
  • 1,3 GB for SOLR indexes

As it has been said, real figures can be different depending on metadata usage and versioning practices. However, it looks like our initial assumptions fits fine with this installation.

Database storage will depend on the software selected, but we have following PostgreSQL data stored in our system:

  • 0,5 GB for database data storage

It looks like it could fit with a 0,1x rule.

Moreover, we have some other storage space to think about in Tomcat folder. This volume will not grow if we apply some control policy:

  • 1 GB for base software
  • 3,5 GB for Tomcat

Conclusion

Once analyzed this live system, our initial formulae can be modified to include some additional data:

  • 2x content
  • 0,2x search indexes
  • 0,1x database data (PostgreSQL)
  • 5 GB for base software

And remember also to provide some remote storage for backup operations.

Published by angelborroy

Understanding software.

Leave a comment