Alfresco, counting more than 1000 elements

Many people need to count elements inside the repository. In a common repository, having more than 1,000 elements from the same type or aspect is a regular scenario.

In this blog post, several ways of counting elements in Alfresco repository are exposed.

Problem statement

How many nodes having businessDocument aspect are in the repository?

Let’s assume the following content model: a base aspect named ust:businessDocument and two inheriting aspects named ust:inboundDoc and ust:outboundDoc

<aspects>
    <aspect name="ust:businessDocument">
        <properties>
            <property name="ust:docDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:inboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:receivedDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
    <aspect name="ust:outboundDoc">
        <parent>ust:businessDocument</parent>
        <properties>
            <property name="ust:sentDate">
                <type>d:datetime</type>
            </property>
        </properties>
    </aspect>
</aspects>

So every content having any of these three aspects must be included.

For this sample, I’ve prepared a repository with 2,403 nodes including any of these aspects.

Using CMIS

A simple Groovy script can be developed by using a simple CMIS Query

import org.apache.chemistry.opencmis.commons.*
import org.apache.chemistry.opencmis.commons.data.*
import org.apache.chemistry.opencmis.commons.enums.*
import org.apache.chemistry.opencmis.client.api.*
import org.apache.chemistry.opencmis.client.util.*

String cql = "SELECT cmis:objectId FROM ust:businessDocument"

OperationContext opCon = session.createOperationContext();
opCon.setMaxItemsPerPage(1000000);

ItemIterable<QueryResult> results = session.query(cql, false, opCon)

println "--------------------------------------"
println "Total number: ${results.totalNumItems}"
println "Has more: ${results.hasMoreItems}"
println "--------------------------------------"


--------------------------------------
Total number: 1000
Has more: true
--------------------------------------

However, CMIS (and also FTS) can only retrieve 1,000 elements. You can play with paging and skipping, but there is no (simple) way to obtain more than 1,000.

Using Database

It’s not recommended to play with Alfresco Database, but it looks like this is the right chance to do it.

Let’s start with a simple query to see what happens.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument');
 count
-------
   801
(1 row)

It looks like parent aspects are not related with the node, so we need to include every inherited aspect in the query.

SELECT count(1)
FROM alf_node AS n,
  alf_node_aspects AS a,
  alf_qname AS q,
  alf_namespace AS ns,
  alf_store AS s
WHERE a.qname_id = q.id
  AND a.node_id = n.id
  AND q.ns_id = ns.id
  AND n.store_id = s.id
  AND s.protocol = 'workspace'
  AND s.identifier = 'SpacesStore'
  AND ns.uri = 'http://www.ust-global.com/model/business/1.0'
  AND q.local_name in ('businessDocument', 'inboundDoc', 'outboundDoc');
 count
-------
  2403
(1 row)

So we have the number we were looking for, but we are scanning alf_node table to get it: database performance could be degraded!

Using SOLR

Skipping all Alfresco overload, we can use directly SOLR Engine to perform a query in alfresco core

https://localhost/solr/alfresco/afts?q=ASPECT:%22ust:businessDocument%22

<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">6</int>
		<lst name="params">
			<str name="q">ASPECT:"ust:businessDocument"</str>
		</lst>
	</lst>
	<result name="response" numFound="2403" start="0">
		<doc>
			<str name="id">_DEFAULT_!8000000000000019!8000000000005279</str>
			<long name="_version_">0</long>
			<long name="DBID">21113</long>
		</doc>
	</result>
	<bool name="processedDenies">false</bool>
</response>

So we found that numFound 2,403 by using a simple query and without degrading Alfresco performance.

REST API

After talking a while with Younes Regaieg and Axel Faust at Alfresco IRC, I realised that there is also a way to provoke a SOLR query when using REST API invocation. When including a query that cannot be processed as TMQ, Alfresco is also counting the elements when using REST API.

This is why we are adding name:* to the query in the following code snippet.

[POST]

https://localhost/alfresco/api/-default-/public/search/versions/1/search

[Payload]

{
  "query": 
  {
     "language": "afts", 
     "query": "name:* AND ASPECT:\"ust:businessDocument\""
  }
}

Results are included in totalItems field in the response:

{
    "list": {
        "pagination": {
            "count": 100,
            "hasMoreItems": true,
            "totalItems": 2403,
            "skipCount": 0,
            "maxItems": 100
        },

The same thing can be obtained in a different way, but there is always an alternative that is better than the others. Some experimentation on test environments can save you more troubles in your real service!

Anuncios

6 comentarios en “Alfresco, counting more than 1000 elements

      • I don’t think that would be the case; Unless when your query goes through DB. A trick to force it to go to solr would be to add a wildcard like “cm:name:*”

  1. Hey, Gethin, we made it again.

    Axel and Younes helped me to understand how to get that count by using the “new” REST API.

    Thanks to all!

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión /  Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )

Conectando a %s