Using Alfresco associations in CMIS

CMIS is a standard specification designed to be used as client API for different ECM platforms. Alfresco provides great support for this standard.

However, when managing associations (named relationships in CMIS), there are some limitations to consider:

  • Associations cannot be created when creating the node, they must be created after
  • Associations are not available in CMIS QL, as cmis:relationship is not queryable

A sample project is available at cmis-associations-alfresco in order to describe these operations.

This project includes an Alfresco Content Model with an association named cmisassoc:related and an additional property named cmisassoc:relatedRef to store a copy of these associations. This copy property can be used from CMIS QL to get the related documents.

The project also provides a sample CMIS Client based in Spring Boot for testing.

Using CMIS Browser protocol

CMIS (Content Management Interoperability Services) is an Oasis Standard to work with one or more Content Management repositories.

The API is exposed using three different bindings for clients:

Additionally, several client libraries are provided in Apache Chemistry web page, like Java, Python, PHP, .NET, Objective-C and JavaScript.

In this blog post, RESTful Browser binding operations for an Alfresco CMIS Repository are described.

Base URL and Authentication

For Alfresco 4.2+, the base URL for CMIS Browser binding is:

http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser

It can be used HTTP Basic Auth to obtain repositories information using this base URL:

$ curl -X GET \
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser \
     -H 'Authorization: Basic YWRtaW46YWRtaW4='
{
   "-default-":{
      "repositoryId":"-default-",
      "repositoryName":"",
      "repositoryDescription":"",
      "vendorName":"Alfresco",
      "productName":"Alfresco Enterprise",
      "productVersion":"6.1.0 (3 r0f0034ee-b79)",
      "rootFolderId":"b1ee4176-6712-468c-89b0-dd0352e93450",
      "latestChangeLogToken":null,
      "cmisVersionSupported":"1.1",
      …
   }
}

Navigation

Repository can be browsed by using Folder and Document names in the URL. For instance, to have access to “Shared” folder in Repository root, it can be used following sentence:

$ curl -X GET \
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser/root/Shared \
-H 'Authorization: Basic YWRtaW46YWRtaW4='
{
   "objects":[
      {
         "object":{
            "properties":{
               "cmis:name":{
                  "value":"New Folder"
                  }
            }
         }
      },
      {
         "object":{
            "properties":{
               "cmis:name":{
                  "value":"Sample-Document.docx"
               }
            }
         }
      },
      {
         "object":{
            "properties":{
               "cmis:name":{
                  "value":"test.txt"
                  }
            }
         }
      }
   ],
   "hasMoreItems":false,
   "numItems":3
}

Children list is returned as result, as this is the default Children Object for a Folder Object. If we were accessing to a different type of object, following elements are returned as result:

cmis:document        content
cmis:folder          children
cmis:relationship    object
cmis:policy          object
cmis:item            object

Objects

When accessing an object, different detail can be specified by using cmisselector parameter in an HTTP GET request: children, parents, object, properties, content, renditions, version, relationships, policies, acl

Accessing to document properties, can be expressed with the following URL:

$ curl -X GET \
'http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser/root/Shared/test.txt?cmisselector=properties' \
-H 'Authorization: Basic YWRtaW46YWRtaW4=' \

Actions

In order to perform operations in a Repository node, cmisaction parameter in an HTTP POST request must be used: createDocument, createFolder, createRelationship, createPolicy, createItem, query, createType, deleteType, delete, deleteTree, deleteContent, checkOut, checkIn, update

A new document of type cmis:document and name test.txt (uploading content from local file stored at /tmp/test.txt) can be created by using createDocument action.

$ curl -X POST \
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser/root/Shared \
-H 'Authorization: Basic YWRtaW46YWRtaW4=' \
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
-F cmisaction=createDocument \
-F 'propertyId[0]=cmis:objectTypeId' \
-F 'propertyValue[0]=cmis:document' \
-F 'propertyId[1]=cmis:name' \
-F 'propertyValue[1]=test.txt' \
-F file=@/tmp/test.txt

This document can be removed using delete action.

$ curl -X POST \
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser/root/Shared/test.txt \
-H 'Authorization: Basic YWRtaW46YWRtaW4=' \
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
-F cmisaction=delete

And even queries can be also launched by using the query action.

$ curl -X POST \
http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/browser \
-H 'Authorization: Basic YWRtaW46YWRtaW4=' \
-H 'content-type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW' \
-F cmisaction=query \
-F 'statement=select * from cmis:document where cmis:name like '\''test.txt'\'''

It’s recommended to use in every HTTP request the param succinct=true to obtain a smaller payload.

Additional samples

Additional CMIS Browser REST API invocations are available at GitHub as a Postman Collection:

https://gist.github.com/aborroy/3f1f2360b0e85067675643aa648a8112

Alfresco Global Virtual Hack-a-thon – Spring 2019

Language detection during indexing

This year Alfresco has celebrated a Hack-a-thon in Spring, in addition to the classic Autumn Global Virtual Hack-a-thon happening in October during the last years.

A hack-a-thon is an event in which software developers, architects, interface designers and – to a lesser extent – project managers collaborate in a restricted time span on a project of their choice outside of the normal work environment and its restrictions. The projects can be anything – experimental prototypes and extensions to existing functionality that people normally don’t get around to coding are popular options. In the context of the Alfresco community we have expanded this meaning to cover anything related to the Alfresco product, its ecosystem and community. You can find a detailed view of the projects developed during this hack-a-thon in the Community page.

This year I’ve been focused during the event on language auto detection during indexing. On current Alfresco versions, content language is set at Alfresco Repository by using client locale configuration. When indexing, SOLR takes this language from repository to perform the indexation. However, when working on cross-locale environments, some users are uploading content in a different language from client language settings. Having the right language identification will provide better results when searching.

A first approach to this concept has been drafted at:

https://github.com/aborroy/SearchServices/pull/1/files

I was using LangDetect library in class SolrInformationServer to inspect the first 10k characters of text from every document in other to set the locale based on this language detection. This little tweak allows the content to be indexed with the right locale, without relying on the erratic previous behaviour, based in repository browser detection one.

During the session, this auto detection feature has been tested with a wide document catalog in different languages (English, Spanish, French and Tagalo) and the results were very accurate.

I was using URLs like the following one to inspect the property of the locale field in SOLR, as using the facet.field property allows to select what Document properties are being returned by the query:

http://localhost:8983/solr/alfresco/afts?facet.field=sys:locale&facet=on&fl=*&indent=on&q=cm:name:%22Sample-Document-ES.docx%22&wt=json

At the end of the day, this is my recap:

  • When reading all the document to auto detect the language, execution time grows proportionally to document extension. This could be useful if Alfresco SOLR Model were storing locale property as a multiple value, but this property is simple by now. This is why I included a text length limitation, to control the performance of the feature.
  • Tika also provides a langdetect library, but this library relies in Google Guava 16.01, that is incompatible with Alfresco SOLR Maven project. Externalising the auto detection could be considered for using Tika or any other tool like Textract for language detection.

I’ve been sharing the day in the Hacker Room with my former colleagues Daniel Fernández and David Martos and virtually with many other well-known Alfresco Developers. As in the past years, Axel Faust and Francesco Corti have been following the session during all the Sun-to-Sun day.

If you didn’t participate this year, it’s time to prepare the Alfresco Global Virtual Hack-a-thon – Autumn 2019!