With the beginning of XXI century and even a couple of decades before, the world was started to be seen as a digital one, and the importance to getting rid of analogue stuff and looking for a digital replacement. When Johannes Gutenberg invented the first printer, or even before when Sumerian civilization invented writing, they could hardly believe some centuries later that that would be old fashioned. Let’s assume this: Paper is dead. We have certainly found a better surrogate for paper, screen and displays, either laptops, mobile, tables, or any other device capable of showing a paper document equivalent. The same way we had libraries, archives or any other material device to store and manage documents, we also found what we needed in the digital world: ECM.
How important is search in an ECM?
With Enterprise Document Management we can store any kind of paper- like document, and also paper-unlike as multimedia content. This content could be a document, a record, at some point a document and later a record…whatever. But once you store such documents, users will want to perform operation with them, and what is always the first operation over a document? It’s simple: find it. That means one of the most key parts of an ECM is Search. Search has to be fast, accurate and user friendly. Search has to manage a large amount of documents, since the more documents we bring to the system, the larger the system is. Search needs to apply additional search filters to allow users to be able to search documents based on their exact needs. Okay, we know now (if we didn’t before) how important Search is in an ECM.
How is Search in Alfresco or any other ECM?
Focusing on Alfresco, we have seen a clear and necessary evolution regarding Search from Alfresco 3 to Alfresco 4, when Alfresco decided to offer Solr as its preferred Search solution. Alfresco keeps offering Lucene as per backward compatibility is still a requirement, but stating strongly ‘choose Solr unless your repository is a small one’ Great benefits brought with Solr are the following: Even though performance has increased hugely with this Search engine switch, there might be some cases where the solution is not 100% compliant with customer requirements or use cases. You can take a deeper look at which cases they are in the presentation link provided within the blog but in a few words, users either want different Alfresco instances and to apply a real federated search, or want to have multiple repository types and they want to search across all of them over the same platform. Main issue with this Digital era we are talking about is that it is evolving so tremendously fast. In a Search solution context this is pretty much the same: Awesome cloud like solutions are getting matured and with fantastic results as well that we can permit to be non integrated with Alfresco or other ECMs at all.
Top three technologies: Solr Cloud, Elastic Search & Amazon Cloud Services
Everyone will agree with the Top Three of these technologies are Solr Cloud, Elastic Search and Amazon Cloud Services, so users in an ECM might want to use one of them, or perhaps all of them! Why don’t we make this a reality! The Zaizi Solution! This ZAIZI solution was quite challenging as we didn’t want to lose the change to bring into the solution either new repositories or any other Search solution which might emerge rapidly. With the previous premise and some other requirements, our solution can be presented by the following elements:
- Decouples Search solution from Alfresco
- Allows you to implement different Search solutions
- Allow you to change Search solution without changing anything inAlfresco – Not even a property!
- Provides an API to integrate it with Alfresco as search engine – Even other repository vendors! E.g. Filesystem, Sharepoint, Documentum, Filenet, Drupal..
- And preserve security permissions in the results – Alfresco permissions are indexed and used during search and it’s part of our Sensefy 1.5 version!
For building such a solution we have supported, from one side, our incredibly marvelous team, and the other has been Apache Manifold CF.
Manfifold CF – the Open Source project from Apache Software Foundation Framework
ManifoldCF is an Open Source project from the Apache Software Foundation Framework , whom main purpose is to connect source content repositories with target repositories or indexes. In order to maintain a flexible and extensible solution, Apache Manifold is based on the use of connectors, where three different types can be presented:
Repository Connectors, this extracts information from repositories to be used for Output Connectors. These connectors can be anything data source capable to populate information. Some examples of these connectors could be Alfresco itself, or any other ECM, providing information from either CMIS or another API, or even a database or a filesystem.
Output Connectors, Aim is to stor the information captured from one or several different repository connectors, storing the information in the way they usually do. Some of those could be any of the Search solutions provided in this topic (i.e. Solr Cloud, Elastic Search and Amazon Cloud Search) or even Alfresco or any other ECM, any component which can provide storage services. Just note the main point of using Output connectors is being able to manage the information, and therefore search solutions are the best choice here, but for any reason another Output type could be chosen with equal easiness.
Security Connectors, which enforce target repositories to fulfill security policies from sources. In other words, permission from a filesystem, Alfresco or another ECM are also regarded in the population of data so results are filtered according to source repository permissions. For those people with experience in Alfresco, this can be seen as the way Solr indexes permission from Alfresco and allows permission check at query time.
Once our research of components to be used was settled and we decide to support on Apache ManifoldCF for our purpose, there was still lot of work to do, as so many pieces from this puzzle were still unfit. Most outstanding work to be remarked across this puzzle is given here, although best news is still to come, since we have planned to release some of these components for the community!
- New implementation of Alfresco Repository Connector, getting rid of the dependency of the current Alfresco Solr Version.
- Output connector for Amazon Cloud Services
- Improvements and new features on Elastic Search Connector – by using a similar approach to Alfresco Solr integration, managing ACL reads for Users and Groups in Alfresco.
Full text search on File Systems across our solution
And because we never stop, after the webinar and the number of questions related to permissions in a File System or Shared Folder, we are in a position to announce that we have modified the filesystem connector. We realize about permissions when we include a File System in our solution. This will allow users to have the ability to full text search on File Systems across our solution and results being filtered for the user who performs every request!
This component will track on the file system not only for differences in documents but also for differences in permissions, so permissions will be updated when changes have happened. This connector will extract and store permissions from File Systems (Java 7 is required) and will map those permissions in Output Connectors. When a request is done, our solution will request for the permissions of the user who performs the operation making Output Connectors aware of that, which will provide filtered results. As it can be seen, this approach is quite similar to Alfresco and Solr.
Scalability is one of the premises of this solution
As you may have guessed, scalability is one of the premises of this solution. On the left side of the solution we have the chance to place a combination of the most relevant ECM of the current market, any CMIS repository or even file systems. On the right side, the most outstanding and scalable search solutions. So imagine how scalable it is!
Another important feature from our solution is that we provide an API to be integrated, which could be either Alfresco or any other ECM system, or even a custom UI showcased in our webinar. In this UI, the way text is searched across the different platforms is quite friendly, and equally important, can be filtered by using configurable facets as well. The next screenshot shows the UI we are talking, which is just of the many front end we could be using.
Since we want to give customers the best choice at any time, our next actions are going to be related to perform benchmarking to these three search engines and decide in which case each of them fits with customer requirements depending on scenarios and use cases, and giving customers not only the chance of choosing between any of them but also recommend them which one is the best for them.
We offer professional support to get you started!
Even though some of these components are to be released sooner than later, we keep a very important thing to offer best quality to customers, our professional support! So, if you are interested and think we can help you, drop us an email and we will be quite happy to look into your exact requirements. We believe this solution is good enough as asking you to stay tuned. In return we get the compromise in keeping you posted with the news coming across the next generation of Search in Alfresco and other repositories. You know, Search Us! contact Fran to learn more @fran7alvarez!
by Fran Alvarez, Director, Zaizi Iberia
User-centred design: The key to security positive culturePublished on: 30 November, 2023
Why I’m passionate about patient-centred digital health and social carePublished on: 22 November, 2023
GitHub Copilot: Exploring AI pair programming toolsPublished on: 15 November, 2023
Reflections from Amsterdam: The Global Scrum Gathering 2023Published on: 7 November, 2023