Saplo Text Analysis API Integration for CMS

October 28th, 2011 by Fredrik Hörte

When using Saplo Text Analysis API in a larger system e.g. a content management system which is used daily in production their are some “best-practises” to consider when planning and developing against Saplo API. Use this guide as a complement to the Saplo API documentation.

This guide will be maintained on our Saplo Developer topic site.

Cache results on client-side

When a result has been processed and returned from Saplo API (e.g. related texts or tags), the result needs to be stored on the client side. Don’t connect and make a request to Saplo API everytime you are about to present a result for a user or on a website. Fetch the result from your local storage server.

Why? Otherwise you will run out of API calls and the response time is much faster.

Store all data

Always store everything you get back from Saplo API. For example when fetching tags for a text always store all tags regardless relevance value and store all information you get from the API about the tag, e.g. relevance and category.

Why? You never know what you want to do with your data in the future. Perhaps you want to search for tags that are related to each other, create trending tag lists, order by relevance or category, show 5 related texts one week and 20 another week.

Store data in its own data structure

Always try to have a separate data structure (persistent structure) for you meta data with relations to your texts. You would also want to have your meta data indexed and searchable.

Why? If the meta data is stored in separate places (e.g. XML-files, or together with each text) it is much harder to query the meta data. Example cases when it’s better to store all data in its own data structure:

  • You want to be able to find related and common tags by searching through (count) all tags.
  • You want to update all tags from one certain name to another.
  • You want to get the trending tags for a time period.
  • You want to traverse through your related texts.

Blacklist

If you are using Saplo Tags create a blacklist system that works on article level.

Why? There might be some tags you don’t want to show for a user and these can be different for each text. It’s kind of like an hide/show functionality. Still you might want to be able to index them in a search index.

Handle manual changes

If an editor have changed the results, e.g. added, deleted, edited tags or related texts you have to make sure you handle these cases. For instance re-tagging needs to be forced manually.

Why? If an editor deletes a tag from the result and the text is sent to be tagged again the tag will re-appear which would be really annoying for the editor.

Indicator if text has “auto” result or if it has been approved manually

Use some kind of indicator so you know if a result has been reviewed or not.

Why? Some companies don’t want to push non-reviewed results to the GUI, though they want to have the result searchable and indexed automatically. This indicator can prevent a push to the GUI.

Multiple languages

Saplo API supports English and Swedish. You would need to create two collections (one Swedish and one English) and use one of them as default. If a text is not written in the default language you will not be able to add it to that collection. Than you can add it to the other collection and get results. If it can’t be added to any of the collections it has to be discarded.

Batch job (cron job) and Response Time

Depending on text length and method the response time can various and take longer than expected. Usually a response (tagging, related texts) takes a couple of seconds 2-5 seconds. You would want to keep track of texts that have not yet received results so you can fetch these later.

Send Feedback to Saplo API

Saplo is built using machine learning. This means that we become better the more users and the more feedback the system gets. For example if a tag is wrong we want to know it. Prepare your system so it is easy to send feedback to Saplo API when a user corrects something.

Flexibility for different users

What works for one user/site don’t have to be the correct setup and settings for another. Create a settings file (or admin ui) where the site owner can change properties for their setup. Examples of properties that might be important:

  • Be able to change default input parameters to different API methods (e.g. change wait time, change limits, thresholds etc.)
  • How tags should be presented; alphabetically, by relevance, by category etc.
  • How related texts should be presented; by relevance, by date, how many etc.
  • If “auto” results can be pushed to GUI instantly.
  • Default collections to use in Saplo API
  • API- and Secret keys