Complex Event Processing (CEP) - Thoughts

Tooling

At the tooling level, we need to define what do we want to provide in each step of the lifecycle and how will we be supporting each step.

 

Rules Authoring

This is the easy part . We currently (5.0) support authoring on both eclipse plugin and web (Guvnor). Although, not all features are supported in the guided editor interface. Some of the features are not specific to CEP, but general features that can be used for rules in general. So, I think we should make sure we support the following on the guided editor:

  • Entry points: I think Guvnor could have a simple entry point registration form and allow the users to select a pre-registered entry point for each pattern in their rules. I believe the pre-registration of the entry points would help avoid user mistakes when setting entry point names in rules. In the case of imported rules, Guvnor could use the rulebase API to query existing entry-points and also add them to the list of entry points (?!).
  • Operator parametrization: I am not sure that the guided editor supports operator parametrization at this point. I am also not sure where it looks for the list of operators. We need to start using the registered operator API to look for the available operators to present to the user and allow it to add parameters to operators that support parameters, specially temporal operators (e.g.: after[1s, 10s]). If necessary, we can improve the operator API to make any other necessary information available.
  • Sliding windows: although this is a more complex concept to grasp, I think we should support it at the guided editor. I know Michael had some ideas on metaphores for sliding windows that could be used to make this concept easier for highlevel users. Either we go this route or allow full sliding window declaration at the guided editor.
  • Accumulate and accumulate functions: this is not a complex concept and it is something users really use. As so, we need to make sure we support accumulate and all configured accumulate functions in the guided editor. We should not, however, support the custom code accumulate. Only accumulate functions, as a way to incentivate people to write reusable accumulate functions and eliminate semantic code from the rules definition.

 

Testing

Now the tricky part starts. Proper event testing require control of the session clock. How can we present a GUI for a user to properly parameterize a test is the first question. Second is how can we properly configure a Drools instance with multiple entry-points fed by an event feeder to test scenarions?

 

Event Generator

A specific event generator is quite easy to code as can be seen here:

 

https://svn.jboss.org/repos/labs/labs/jbossrules/trunk/drools-examples/drools-examples-fusion/src/main/java/org/drools/examples/broker/events/EventGenerator.java

 

Question is how do we create a general (enough) event generator for testing?

 

Interface Configuration (input/output integration points)

Ideally we should have a way to create, configure and deploy the integration points using the pipeline API. We probably need an execution server for that. Another nice to have feature would be to allow for user friendly mapping of text, CSV, XML and database tables into declared types, so that deploying Fusion in such scenarios would be easy to do.

 

Execution Server

We need to define and create an execution server environment that exposes services and integrates with interface points like JMS queues, filesystem event sources, and database event sources. Exposing entry points as service end-points would also be a must.

For the execution server to work, we need to spec what would be our deployment artifacts. Guvnor snapshots contain everything needed from a rulebase perspective, but fusion also requires interface (entry/exit points) configuration and connection.

I need to research JBAS microcontainer and deployment model to check if we can create an execution server on top of it. Also need to check with Michael if this execution server overlaps in any way with the work he is doing.

 

Language Features:

 

  • entry-point attribute for rules: define a rule attribute entry-point. This attribute would be the default entry-point for all patterns in that rule, unless explicitly defining a different entry-point. This would override the default "from working memory" implicit entry-point, possibly making the rule less verbose.
  • multi-function accumulate: accumulate must support multi-functions, so that several metrics for the same set of data can be calculated at the same time, increasing performance and throughput.
  • event initialization: we need to define ways to do event initialization for declared types. Also, we need to define a friendly syntax to generate events from a rule LHS declaration.
  • network JIT: need to check the status on network JIT and work on that as a means to increase throughput, specially on alpha network.
  • tumbling window behaviour: a windows that advance in batch.
  • unique/group by behaviours: similar to SQL features
  • sequence CE: we need a way to write rules that are computational eficient on executing the followed-by constraint on events, i.e., a feature that allow the user to declare that he is looking for a set of events in a given sequence. I was thinking and there are a few options to support that, like implementing it as a CE, as an accumulate function or as a constraint. My main goal is to have an implementation that is computational efficient. Eventually, the feature might need to somehow fallback into a state machine algorithm. Really need to think about this.

 

 

Clock abstractions

 

What semantics should be assigned to the Clock abstraction?

At this point, the goal is to abstract the possible semantics through an interface and provide pluggable implementations for different semantics. Out of the box, we intent to provide:

  • Realtime Clock: a clock that periodically synchronizes with the machine clock - DONE

  • Pseudo Clock: a generic implementation that is arbitrarily updated by application code - DONE

  • Heartbeat Clock: a clock that is updated every time a given event arrives. Useful for session synchronization.

  • Event Attribute Clock: a clock that is updated by a configured event attribute.

We also need to think about "composite-clock" strategies. For instance, in a fault recovery procedure, it would be necessary to use a PseudoClock or HeartbeatClock to replay the event backlog, but the application should be allowed to switch to RealtimeClock as soon as the backlog is consumed.