Alfresco, counting more than 1000 elements

Many people need to count elements inside the repository. In a common repository, having more than 1,000 elements from the same type or aspect is a regular scenario.

In this blog post, several ways of counting elements in Alfresco repository are exposed.

Problem statement

How many nodes having businessDocument aspect are in the repository?

Let’s assume the following content model: a base aspect named ust:businessDocument and two inheriting aspects named ust:inboundDoc and ust:outboundDoc

<aspects>
    <aspect name="ust:businessDocument">
        <properties>
            <property name="ust:docDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:inboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:receivedDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:outboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:sentDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
</aspects>

So every content having any of these three aspects must be included.

For this sample, I’ve prepared a repository with 2,403 nodes including any of these aspects.

Using CMIS

A simple Groovy script can be developed by using a simple CMIS Query

import org.apache.chemistry.opencmis.commons.*
import org.apache.chemistry.opencmis.commons.data.*
import org.apache.chemistry.opencmis.commons.enums.*
import org.apache.chemistry.opencmis.client.api.*
import org.apache.chemistry.opencmis.client.util.*

String cql = "SELECT cmis:objectId FROM ust:businessDocument"

OperationContext opCon = session.createOperationContext();
opCon.setMaxItemsPerPage(1000000);

ItemIterable<QueryResult> results = session.query(cql, false, opCon)

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"


--------------------------------------
Total number: 1000
Has more: true
--------------------------------------

However, CMIS (and also FTS) can only retrieve 1,000 elements. You can play with paging and skipping, but there is no (simple) way to obtain more than 1,000.

Using Database

It’s not recommended to play with Alfresco Database, but it looks like this is the right chance to do it.

Let’s start with a simple query to see what happens.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument');
 count
-------
   801
(1 row)

It looks like parent aspects are not related with the node, so we need to include every inherited aspect in the query.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument', 'inboundDoc', 'outboundDoc');
 count
-------
  2403
(1 row)

So we have the number we were looking for, but we are scanning alf_node table to get it: database performance could be degraded!

Using SOLR

Skipping all Alfresco overload, we can use directly SOLR Engine to perform a query in alfresco core

https://localhost/solr/alfresco/afts?q=ASPECT:%22ust:businessDocument%22

<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">6</int>
		<lst name="params">
			<str name="q">ASPECT:"ust:businessDocument"</str>
		</lst>
	</lst>
	<result name="response" numFound="2403" start="0">
		<doc>
			<str name="id">_DEFAULT_!8000000000000019!8000000000005279</str>
			<long name="_version_">0</long>
			<long name="DBID">21113</long>
		</doc>
	</result>
	<bool name="processedDenies">false</bool>
</response>

So we found that numFound 2,403 by using a simple query and without degrading Alfresco performance.

REST API

After talking a while with Younes Regaieg and Axel Faust at Alfresco IRC, I realised that there is also a way to provoke a SOLR query when using REST API invocation. When including a query that cannot be processed as TMQ, Alfresco is also counting the elements when using REST API.

This is why we are adding name:* to the query in the following code snippet.

[POST]

https://localhost/alfresco/api/-default-/public/search/versions/1/search

[Payload]

{
  "query": 
  {
     "language": "afts", 
     "query": "name:* AND ASPECT:\"ust:businessDocument\""
  }
}

Results are included in totalItems field in the response:

{
    "list": {
        "pagination": {
            "count": 100,
            "hasMoreItems": true,
            "totalItems": 2403,
            "skipCount": 0,
            "maxItems": 100
        },

The same thing can be obtained in a different way, but there is always an alternative that is better than the others. Some experimentation on test environments can save you more troubles in your real service!

Anuncios

Alfresco 6, restoring browser basic auth popup for remote APIs

Recently I’ve found another small and undocumented change in Alfresco 6 to be considered before upgrading.

From alfresco-remote-api.6.3, web browser will not present basic auth popup by default, so operations like CMIS Browsing will not be allowed from browser.

In order to restore previous behaviour, a new property has to be added to alfresco-global.properties

alfresco.restApi.basicAuthScheme=true

Main problem is that once this behaviour has been restored, ADF apps show also basic auth popup in login page before login in, what it’s not so user friendly.

Probably for organisations using both CMIS Browsing and ADF applications a patch will be required.

How to use Docker Composition to develop with Alfresco 5/6

Alfresco 5 can be installed in many different ways:

However, since Alfresco 6, the company has started with a new distribution approach based only in Docker for Kubernetes. Currently Alfresco 6 is only at beta stage, so things can change in the future.

Product strategy is to deliver different dockerizations for Enterprise and Community versions:

  • ACS Enterprise
    • Docker images for “micro-services” will only be available under subscription
    • Currently libreoffice, imagemagick & alfresco-pdf-renderer are included only in Enterprise but it looks like in the future they will split the product in many more services
    • Other services and products (from partners or from Alfresco itself) will be included in this private catalog to be used only with Alfresco Enterprise
    • It’s designed to provide native K8s deployment, so Helm charts can be also provided only for Enterprise customers
  • ACS Community
    • This composition delivers a more monolithic approach, providing only big blocks like alfresco, share and solr
    • There is no official deployment guide by now, but I made a simple approach to share with the Community

Additionally, Alfresco ADF applications have also started to deliver docker compositions for development and testing from 2.0.0 release.

Since there are some guides to develop single modules with Docker for Alfresco 6, there is no specification on how to build a unified integration testing environment. Below a sample Docker Composition derived from our internal projects is described.

Required components

  • Database (PostgreSQL or MariaDB)
  • Alfresco repository web application
  • SOLR indexer web application
  • Share web application
  • ADF web application
  • LibreOffice transformation server
  • Web Server like Apache HTTPd or Nginx

Database

Official images from PostgreSQL and MariaDB can be used directly in our composition.

One Docker volume for data storage is required in order to provide persistence.

Alfresco repository

As we are using Alfresco 5.2 by now, we are starting with keensoft templates, but Alfresco will provide a basic image for repository from Alfresco 6 officially.

Configuration is provided by using an external alfresco-global.properties file included in the assets folder and injected during image building process.

Modules can be installed by using AMP or JAR packaging. The files can be copied to amps and jars folders to be deployed in alfresco.war during image building process.

One Docker volume for data storage is required in order to provide persistence.

SOLR

Alfresco started to deliver Docker images for SOLR 6, so they can be used directly in our composition.

One Docker volume for data storage is required in order to provide persistence.

Share

As we are using Alfresco 5.2 by now, we are starting with keensoft templates, but Alfresco will provide a basic image for Share from Alfresco 6 officially.

Configuration is provided by using an external share-config-custom.xml file included in the assets folder and injected during image building process.

Modules can be installed by using AMP or JAR packaging. The files can be copied to amps_share and jars folders to be deployed in share.war during image building process.

No persistent storage is required.

ADF

Alfresco does not provide any official application based in ADF to be installed as main users UI. However, you can find a Docker image for Alfresco Content Application (the reference sample application).

Anyway as every ADF sample application provided by Alfresco is using the root path out of the box, it’s required to modify packaging instructions at package.json to produce a web application able to be deployed in a different base href.

No persistent storage is required.

LibreOffice

Alfresco is releasing an official LibreOffice image, but it looks like this image will work only with Enterprise. This is why we are including our image from keensoft templates

No persistent storage is required.

Web Server

Alfresco is not including any configuration to run all componentes together, as they are aiming to deploy the whole thing in K8s. We are using an approach based in Apache HTTPd to configure a simple proxy delivering all these services by plain HTTP port 80.

Operations

Following operations are supported by this configuration:

  • Modules deployment in repository
  • Modules deployment in Share
  • Repository configuration
  • Share configuration
  • ADF application updating
  • Docker rebooting with persistent storage

Recap

Sample project including a reference Docker Composition and some instructions is available at https://github.com/angelborroy/alfresco-docker-201707-GA

Alfresco 6 will require more work from the Community in order to be adopted by small companies, but building a common base could be helpful for everyone.