REST API for TKL Table Management

Have you ever been asked to write a WebReport to manage tables for ADN or Attribute Extension Table Key Lookup (TKL) tables and values?  I had such a task for a client a few months back.  One of the key challenges was allowing users to add, delete, change values not just on items with TKL attributes but the values available in the TKL tables themselves.  Using “out-of-box” WebReports and LiveReports, it’s not the easiest thing in the world to write an app that can do updates to SQL tables that is both flexible (works for any possible TKL attribute), safe (not vulnerable to SQL injection), and multiplatform (works for all manner of special characters and quotes in all major databases).

In fact, I was not at all satisfied with the “out-of-box” WR app, and felt it could be done better and more cleanly if only I had an API to manage the TKL tables that is abstracted away from the Web Report. So, I wrote a REST API for TKL tables.  You have a bit of set up to do, i.e. defining any identity or logical delete fields in your tables, mapping default values for any queries, etc. but with this API, you can write a WebReport that can query TKL tables, and update the attribute values without the WR application needing to know anything about the underlying tables – all that is defined in the configuration.  Also, all the underlying SQL is done using bind variables rather than literal text, which is a safety feature against SQL injection. The calling app merely states to change/add/delete this value for this named attribute.

One obvious question would be why would a client want this. Why wouldn’t the DBA merely update the TKL tables in SQL Server Manager or JDeveloper or SQL+?  For this client, the answer is that the folks making decisions about TKL values were not DB admins and needed a business friendly way to manage the underlying tables that underpinned their metadata model.

I’d be curious to know if others see a use for such an API. So far, I’ve tested it on SQL Server 2012 and Oracle 11g for Content Server 10.0. I plan on releasing for other versions if there is sufficient interest.

REST For Content Server Physical Objects

While working on a government contract, I encountered the Physical Objects module from OpenText which allows encapsulating details about a physical archive (think old books, microfilms, file folders – the kind that contain paper, etc), and the management of loaning them out and tracking them.

One of my big tasks was to write WebReports to create for the client a User Interface that placed the data that was important to *them* at their fingertips. After many months, and several thousand lines of spaghetti code, spanning God knows how many WebReport and LiveReport objects, I delivered a working interface that managed the aspects of Physical Inventory that were important to them.

Ever since then, I’ve been looking at how I could have written a UI in WebReports that would be easier to maintain, with fewer peripheral objects, and fewer lines of spaghetti code.  I believe the answer comes in the form of OpenText’s new REST API.  I’ve learned from others that it is indeed possible to have a jQuery AJAX function within a WebReport consume the REST API. All you need to do is pass in the authentication token which is available from a WebReport tag, and the REST call is done as the same user that is executing the WebReport.

This got me thinking. Given that many of the steps in writing my own custom interface involved a lot of writing sub-reports, perhaps encapsulating what I had been doing in a REST API might be a worthwhile endeavor.  Imagine writing a WebReport based UI that needs only a handful of reports to deliver key functionality instead of 20 or 30!  Think of how much easier that is to transport between development and production environments!  A REST API for Physical Inventory would also expose the functionality of the Physical Objects module to hand-held scanning devices, allowing dock workers to scan data into Content Server from the warehouse.

I have spent some time this week developing a REST API for Physical Objects. I have no idea whether OpenText has this on the roadmap – I believe that physical inventory management is not considered a growth industry. Nevertheless, within a few days, I was able to develop a REST API that can do the following:

  • Get basic information about a physical object, including its borrow status and its physical properties
  • Get or set any physical item property
  • Borrow the item
  • Return the item
  • Get a list of valid media types that can be children of this object.

I would be curious to hear from others whether they see any large appeal to such an API. At this point, I’ve only invested a few days in this, but if this is something people want to see, I’m prepared to invest more time in it.

Just released : A REST API for Content Server Workflow

Today, I am happy to announce that Hugh Ferguson Consulting Ltd. is releasing our latest product: REST API Extensions for Content Server Workflow (RESTWF).  OpenText has been promoting the REST API as the new way to quickly write responsive User Interfaces.  The existing API allows for document/folder, category, and form operations. The one thing that is missing is Workflow.  With RESTWF, it is now possible to launch workflows, load tasks, upload/download attachments, read or set attributes, and read or send comments.  Although OpenText has workflow extensions on their product roadmap, they are not part of the latest 10.5 build.

In the interest of generating some comments, and soliciting feedback, here is an overview of the API. There are three main parts:

  • assignments – API to access WF assignments, and to interact with individual tasks
  • wfpackages – API to access workflow data in an assigned task such as comments, attributes, and attachments
  • workflow – API to initiate a workflow

I will be publishing the API in the coming days on the corporate website at hferguson.ca

 

Distributed Agent Job Chains – getting child results from Reduce() function

Geek alert: This blog post is highly technical and assumes a certain level of Oscript development skills.

In my last blog, I discussed improvements made to the Distributed Agents framework in Update 201506. As a result of that post, John Simon asked if the improvements extended to capturing the results of child tasks on the Reduce() function.  At the time, my research was inconclusive. Today I did a little experiment on two CS10.0 releases, one at U201503 and the other at U201506. The upshot is both behaved identically, so if the behaviour I’m citing is an improvement, it was introduced as early as U201503 (anyone from OpenText DEV care to comment?).

In a nutshell, it is possible to get results to bubble up from child task to master task. This happens in the Reduce() function. Unfortunately, any aggregated results don’t seem to make it to the Finalize task which is where most developers would like to get aggregated data about the results of the tasks that were executed.

So what aggregation *can* be done?

The Reduce() function gets data from two main locations, the list of assoc’s passed in as its one argument, and the .fTaskData feature on the Task object (i.e. your instance of your JobChain orphan).  Tragically, .fTaskData means something completely different depending on what function you are in. i.e. within Split(), it’s the original assoc you passed in with your call to  $DistributedAgent.MapReducePkg.Map(), but something else in Map(), Reduce(), and Finalize(). I summarize below:

blogTable-1

 

Yeah, it’s a bit messed up, and no wonder Oscript developers are often confused about what data they are getting when they are in different functions of the JobChain task object. Before I proceed with how to get child data to bubble up, let’s look at the sequence of events when a JobChain is kicked off. The sequence of events is:

  • Master task executes Split() function, creates a set of Assoc’s for each portion of the job. This list of chunk task data’s is returned in the Split Function’s “set” feature (i.e. rtnVal.set).
  • Master task executes Map() function for the first task data in the set from above (yeah, kind of weird)
  • Child tasks are assigned for each other taskData in the set from above. That means if your split function only split into one chunk, everything gets executed in the Master task.
  • Each Child task executes their Map() function.  Any data you want to go to Reduce should go into retVal.Data for Map’s return
  • Master task executes Reduce() and complains that child tasks outstanding (this sometimes creates an endless loop in builder, but it works fine when the server is running).
  • Each child executes Reduce().
  • Master task re-executes Reduce() and now succeeds
  • Master task executes Finalize().
  • And that’s the flow of execution.

 

 

Now, back to the question of how we aggregate data in the Reduce() function.  As you see from above, the Master Reduce() tries to execute first (it containing the first taskData generated in Splot()). It exits out because child tasks are executing.  The details of how master and child tasks map to one another is always subject to change. For our purposes we don’t need to know. In each child task’s running of Reduce(), the key thing is that at the end of the task, any details you want to get to the Master Reduce() go into rtnVal.childStatus.  So let’s assume a Map() function like the following which returns an object count and a skipped count (i.e # of objects processed and skipped) in a data assoc:

MapFunction

The Reduce() function will get an .fTaskData that is the rtnVal from the function above. You can then write the values of that data to Reduce’s rtnVal.childStatus assoc, i.e.

childStatus.data = .fTaskData.data
rtnVal.childStatus = childStatus

When the master task executes Reduce(), it gets a list of all the childStatus Assoc’s from all the child tasks.  If you loop through them, you would get the results of each Map task. These results along with the results stored in .fTaskData gives you all your task results in one location.

There is of course one small problem.  Both a child task and a master task for which no child tasks executed will appear identical – both will have an empty list for the childResults argument.

There is a cheap trick to solve this. We can take advantage of the fact that in a Job Chain, when our split function splits the job up into smaller pieces, the Master task always takes the first item in the list – even if its a list of 1. If we were to assign a row count to each child taskData that indicates the row, and we returned that in our Map() function in the results, when we’re in Reduce() we would always know whether the task is a master with no children or a child by the rowcount. In Split() you’d do something like this:

RecArray rows = ...some SQL to get back our distributed results
Record row
Integer rowCount = 0
for row in rows
   Assoc chunkData = Assoc.Copy(.fTaskData)
   ...set whatever else you need to on chunkData
   chunkData.RowNumber = rowCount
   rowCount += 1
end

In the Map() function, you would add a line like

data.rowNumber = .fTaskData.RowNumber

Now your Reduce function would look something like:

/**
* This method will reduce the task results.
* 
* @param {List} childResults Results from execution of child tasks
*
* @return {Assoc}
* @returnFeature {Boolean} ok FALSE if an error occurred, TRUE otherwise
* @returnFeature {String} errMsg Error message, if an error occurred
* @returnFeature {Assoc} childStatus ok/errMsg Assoc aggregating status of child results
* @returnFeature {Integer} childStatus.count Number of facets/columns indexed
* @returnFeature {RecArray} errDetail Detailed information about errors
* 
**/

function Assoc Reduce(\
 List childResults )
 
 Assoc result
 Assoc taskData = .fTaskData
 Assoc data = taskData.data
 
 // Assuming we set this up in Split(), RowNumber() will always be 0 for the 
 // master task
 Boolean isMaster = IsDefined( data.RowNumber) && data.RowNumber == 0 ? TRUE : FALSE
 
 // Get the results of the first task which was executed as the master
 Integer count = data.count
 Integer skipped = data.skipped
 
 // Aggregate the rest of the results
 if IsDefined( childResults )
    for result in childResults
       // Assume our child counts are in data
       if IsDefined( result.data )
          count += result.data.count
          skipped += result.data.skipped
       end
    end
 end
 
 if isMaster
    // do something with aggregated count/skipped integers
 else
    childStatus.data = data // For the child task, push this
 end
 
 rtnVal.ok = ok
 rtnVal.errMsg = errMsg
 rtnVal.errDetail = errDetail
 rtnVal.childStatus = childStatus
 
 return rtnVal
end

Using the above Reduce() function, we know which instance of Reduce() is our master, and we received an aggregated count from the Map(). Any other information could be passed such as error information from the child task.  It is unfortunate that this information doesn’t bubble up to Finalize(), and perhaps that is what OpenText is still working on.

If you’ve read this far, you must be a veteran Oscripter 🙂  Comments are welcome, even to tell me I have something completely wrong.

Distributed Agent improvements in CS 10.X U 201506

As I had promised in a linked-in status update, I would write something about the changes to distributed agents in the latest updates from Open Text.  Over the weekend, I installed the latest 10.0 release, and today I installed the latest 10.5 release.  At first, it wasn’t obvious what the change was between them. However, the change is there, under System Administration        –>Distributed Agent Status.  Unfortunately it is only available in the 10.5 release of U201506, not the 10.0 release.   On this page, which previously just listed the status of the worker threads, you can now set a black-out period.  So what does this do for us exactly?

The handy thing about specifying a black-out period for DA’s is that the ones which are the most process intense won’t be consuming your system resources during business hours when you want Content Server to be at its most responsive. The main DA’s that come with Content Server are the ones that rebuild your facet and custom columns tables, and the ones that purge deleted items from DTree (now called DTreeCore) – all operations that are potentially database intensive.

Presumably, a task that is running would not suddenly halt when the outage window time arrives but rather finish what it’s doing and exit.  This should be OK as DA tasks are intended to be short (i.e. 1-10 seconds in duration).  What is not clear is whether a job chain would simply pick up where it left off, or would the job need to be rescheduled. If it is the former, that would be pretty brilliant, and this would have little or no impact on 3rd party vendors.

This is a good feature, but the one drawback is that it is all or nothing.  That is, it prevents all DA tasks from running during an outage period. It is conceivable that facet and column updates would need to continue during the day while the purge tasks could be deferred to after hours. It is conceivable that a system administrator may want to prevent facets from updating but need columns to update during business hours (or vice versa).  Admittedly this is more of an issue for any 3rd party Technology Partner that develops a module for Content Server that uses Distributed Agents.

Prior to the introduction of this outage window for DA’s, partner developers would need to introduce their own semaphores to manage any outages such as not running during business hours. In my company’s product, BMUP, outages are managed through an administration screen which stores a weekly configuration in the KINI table, and a job table that keeps track of each DA task that is launched by user actions. Any task that is launched during a time when DA tasks aren’t supposed to run would be scheduled for later. For any job chain that was still running, each individual task would detect the outage, exit, then the master task would reschedule itself to carry on later. For the core DA job chains that Open Text provides, this level of control is probably overkill. After all, the column & facet jobs are automatically spawned, triggered by changes in Content Server.

Perhaps this is already in the works, but the next change from Open Text to the DA framework, might be the ability to increase the granularity so that particular task types  can have different outage periods.

I will be later writing more about programming with Distributed Agents, particularly because it is in my view an underrated and underutilized framework for doing some really good scalable development in Content Server. Kudos to OpenText for allowing DA’s to be restricted to not run during pre-defined periods.

Welcome to the blog

This blog will be where I talk about various things in the ECM space, particularly focused on OpenText Content Server. My focus will be more development oriented, where I will discuss Oscript, Web Services, ReST, and Web Reports development.