Alfresco, counting more than 1000 elements

Many people need to count elements inside the repository. In a common repository, having more than 1,000 elements from the same type or aspect is a regular scenario.

In this blog post, several ways of counting elements in Alfresco repository are exposed.

Problem statement

How many nodes having businessDocument aspect are in the repository?

Let’s assume the following content model: a base aspect named ust:businessDocument and two inheriting aspects named ust:inboundDoc and ust:outboundDoc

<aspects>
    <aspect name="ust:businessDocument">
        <properties>
            <property name="ust:docDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:inboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:receivedDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:outboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:sentDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
</aspects>

So every content having any of these three aspects must be included.

For this sample, I’ve prepared a repository with 2,403 nodes including any of these aspects.

Using CMIS

A simple Groovy script can be developed by using a simple CMIS Query

import org.apache.chemistry.opencmis.commons.*
import org.apache.chemistry.opencmis.commons.data.*
import org.apache.chemistry.opencmis.commons.enums.*
import org.apache.chemistry.opencmis.client.api.*
import org.apache.chemistry.opencmis.client.util.*

String cql = "SELECT cmis:objectId FROM ust:businessDocument"

OperationContext opCon = session.createOperationContext();
opCon.setMaxItemsPerPage(1000000);

ItemIterable<QueryResult> results = session.query(cql, false, opCon)

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"


--------------------------------------
Total number: 1000
Has more: true
--------------------------------------

However, CMIS (and also FTS) can only retrieve 1,000 elements. You can play with paging and skipping, but there is no (simple) way to obtain more than 1,000.

Using Database

It’s not recommended to play with Alfresco Database, but it looks like this is the right chance to do it.

Let’s start with a simple query to see what happens.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument');
 count
-------
   801
(1 row)

It looks like parent aspects are not related with the node, so we need to include every inherited aspect in the query.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument', 'inboundDoc', 'outboundDoc');
 count
-------
  2403
(1 row)

So we have the number we were looking for, but we are scanning alf_node table to get it: database performance could be degraded!

Using SOLR

Skipping all Alfresco overload, we can use directly SOLR Engine to perform a query in alfresco core

https://localhost/solr/alfresco/afts?q=ASPECT:%22ust:businessDocument%22

<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">6</int>
		<lst name="params">
			<str name="q">ASPECT:"ust:businessDocument"</str>
		</lst>
	</lst>
	<result name="response" numFound="2403" start="0">
		<doc>
			<str name="id">_DEFAULT_!8000000000000019!8000000000005279</str>
			<long name="_version_">0</long>
			<long name="DBID">21113</long>
		</doc>
	</result>
	<bool name="processedDenies">false</bool>
</response>

So we found that numFound 2,403 by using a simple query and without degrading Alfresco performance.

REST API

After talking a while with Younes Regaieg and Axel Faust at Alfresco IRC, I realised that there is also a way to provoke a SOLR query when using REST API invocation. When including a query that cannot be processed as TMQ, Alfresco is also counting the elements when using REST API.

This is why we are adding name:* to the query in the following code snippet.

[POST]

https://localhost/alfresco/api/-default-/public/search/versions/1/search

[Payload]

{
  "query": 
  {
     "language": "afts", 
     "query": "name:* AND ASPECT:\"ust:businessDocument\""
  }
}

Results are included in totalItems field in the response:

{
    "list": {
        "pagination": {
            "count": 100,
            "hasMoreItems": true,
            "totalItems": 2403,
            "skipCount": 0,
            "maxItems": 100
        },

The same thing can be obtained in a different way, but there is always an alternative that is better than the others. Some experimentation on test environments can save you more troubles in your real service!

9 thoughts on “Alfresco, counting more than 1000 elements

    • Just to make it easy for the readers, Search API is exactly the same as CMIS: you can only count 1,000 elements.

      • I don’t think that would be the case; Unless when your query goes through DB. A trick to force it to go to solr would be to add a wildcard like “cm:name:*”

      • Thanks for your contribution. The blog post has been updated with your tip.

  1. Hey, Gethin, we made it again.

    Axel and Younes helped me to understand how to get that count by using the “new” REST API.

    Thanks to all!

    • I don’t remember about NodeService, but for SearchService you can use following pattern:

      SearchParameters sp = new SearchParameters();
      sp.setLanguage(SearchService.LANGUAGE_CMIS_ALFRESCO);
      sp.addStore(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);
      sp.setMaxItems(Integer.MAX_VALUE);
      sp.setMaxPermissionChecks(Integer.MAX_VALUE);

      • Hi Thanks for your response, is this way supports following things?

        1) total counts in response along with node ids
        2) filter by aspect name, file name, creation date
        3) sort by creation date
        4) pagination

        If yes, can you give example with it?

        Regards,
        Vishal Z.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s