Which Alfresco version am I using?

Till some months ago, Alfresco released Community and Enterprise editions by using two numbers for major version and

  • a letter for minor version in Community (5.0.d)
  • a third number for minor version in Enterprise (5.0.4)

However, from 5.1 release they are using a different approach for Community branch, as is identified by the following pattern

  • YEARMONTH-[EA | GA]
    • EA = Early access (only for testing purposes)
    • GA = Generally available (ready for prod environments)

So, the last mature version available for Community 5.1 is named 201605-GA and for Community 5.2 is named 201701-GA. This makes sense as alfresco and share components have been versioned individually, so Alfresco 201701-GA contains:

  • 5.2.e for Alfresco platform alfresco.war
  • 5.2.d for Alfresco share share.war

Share component has a single development line, so the same software is used for both Community and Enterprise releases. However, Enterprise release (which is named Alfresco One from some months ago) still uses 5.2.0 scheme for identifying the version, marking the alfresco component core for the release.

Once the server is running, Alfresco version can be obtained by accessing http://localhost:8080/alfresco  (although there are other places where this number is shown)

alfresco-version

As it can be seen, this naming convention is again different from previous ones Community – 5.2.0 (r134428-b13). Alfresco Community web site contains a list where this numbers are related to distribution ones: Release Cross Reference. So, in this case we have a 201701-GA (aka 5.2.e).

Many different naming conventions had been used for Alfresco products, but it is always possible to identify which version is running a server

Alfresco, massive delete of users

Since Share web interface nor admin web console have option to delete massively users, this operation has to be performed by your own rules.

I’m using Javascript Console to make easier the execution of Administration JavaScripts, but any other tool (even curl) can be used.

// Get every user in Alfresco
var nodes = people.getPeople(null);
 
for each(var node in nodes) { 
  // Build user object
  var user = utils.getNodeFromString(node);
  // Obtain userId
  var userName = user.properties["cm:userName"];
  // Obtain userHomeFolder
  var userHomeFolder = user.properties["cm:homeFolder"];
  // Filter users to be deleted by any criteria, starting by "CES" in this case
  if (userName.indexOf("CES") == 0) {
      logger.log(userName + ", " + userHomeFolder.properties["cm:name"]);
      // people.deletePerson(userName);
      // userHomeFolder.remove();
  }
} 

Once you have verified that every userName logged by this script has to be deleted, you can uncomment following lines…

      // people.deletePerson(userName);
      // userHomeFolder.remove();

… and run the script again to perform effectively the removal.

As other Alfresco community members have pointed out by Twitter and LinkedIn, this script is just a sample for your understanding. Many other factors (LDAP, repository permissions, ownership over contents…) have to be considered before using a script like this in a real environment.

Alfresco, installing OCR as an external service

Alfresco Simple OCR Action has became a popular alternative to provide an OCR service to Alfresco Community servers running Linux or Windows.

In many use cases, configuration guide is enough, but there are some other scenarios where intensive use of the OCR service requires a more complex deployment. Below, is described a configuration where OCR service is installed on an external server, which allows mantaining Alfresco capacity independently on how many OCR operations are running.

Alfresco is using SSH to communicate with the OCR server running pdfsandwich on CentOS 7 in this sample, but any other protocol, OS or OCR software can be selected.

Configuring OCR server

Software requered:

  • CentOS Linux release 7.3.1611 (Core)
  • pdfsandwich 0.1.4

Ports required:

  • 22 (SSH): OCR service

pdfsandwich program is used to build searchable PDFs from PDFs containing images or TIFF files. This program will be invoked by command line from Alfresco by using Alfresco Simple OCR addon.

Installing pdfsandwich

Installing required dependencies.

$ yum -y install wget gcc gcc-c++ make autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel ocaml ImageMagick ImageMagick-devel

Installing leptonica from source code.

$ wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
$ tar xvf leptonica-1.72.tar.gz
$ cd leptonica-1.72
$ ./configure
$ make
$ make install

Installing tesseract OCR from source code.

$ wget http://github.com/tesseract-ocr/tesseract/archive/3.04.01.tar.gz
$ tar xvf 3.04.01.tar.gz
$ cd tesseract-3.04.01
$ ./autogen.sh
$ ./configure
$ make
$ make install
$ ldconfig

Installing every language package for tesseract.

$ wget https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
$ tar xvf 3.04.00.tar.gz
$ mv tessdata-3.04.00/* /usr/local/share/tessdata/

Installing unpaper by using RPM.

$ wget http://dl.fedoraproject.org/pub/epel/6/x86_64/unpaper-0.3-4.el6.x86_64.rpm
$ rpm -ivh unpaper-0.3-4.el6.x86_64.rpm

Installing pdfsandwich from source code.

$ wget http://downloads.sourceforge.net/project/pdfsandwich/pdfsandwich%200.1.4/pdfsandwich-0.1.4.tar.bz2
$ tar xvf pdfsandwich-0.1.4.tar.bz2
$ cd pdfsandwich-0.1.4
$ ./configure
$ make
$ make install

Verifying the software has been installed properly.

$ pdfsandwich -version
pdfsandwich version 0.1.4

Configuring Alfresco server

Once Alfresco is installed and Alfresco Simple OCR is available, a script is created to invoke remote OCR server.


#!/bin/bash

# pdfsandwich hostname
PDFSANDWICH_SERVER="alfresco-ocr.keensoft.es"

# extract filenames
INPUT=$(basename "$3")
OUTPUT=$(basename "$5")

# SSH parameters
SCP=scp
SSH=ssh

# copy original pdf to pdfsandwich server
$SCP $3 root@$PDFSANDWICH_SERVER:/tmp/$INPUT

# execute pdfsandwich program (requires administrator privileges)
$SSH root@$PDFSANDWICH_SERVER "pdfsandwich $1 $2 /tmp/$INPUT $4 /tmp/$OUTPUT $6"

# copy transformed pdf back to alfresco server
$SCP root@$PDFSANDWICH_SERVER:/tmp/$OUTPUT $5

# remove temporal files
$SSH root@$PDFSANDWICH_SERVER "rm -f /tmp/$INPUT; rm -f /tmp/$OUTPUT"

An RSA key is required to communicate both servers by using SSH without user interaction.


$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
$ ssh-copy-id -i ~/.ssh/id_rsa.pub alfresco-ocr.keensoft.es

alfresco-global.properties has to be updated to use above script when invoking pdfsandwich.


ocr.command=/opt/alfresco/scripts/ocr.sh

Alfresco is restarted to apply configuration.

$ systemctl restart alfresco

Final words

Once our system is up and ready, OCR tasks are sent by Alfresco to external OCR server, allowing to maintain Alfresco quality service.
Even increasing capacity for OCR service by using async thread pool defined at alfresco-global.properties will not impact in Alfresco server.


# Default Async Action Thread Pool
default.async.action.threadPriority=1
default.async.action.corePoolSize=8
default.async.action.maximumPoolSize=20

It happens that Alfresco quality service does not rely on what software is installed but on how that software is installed.