EventStreamAnalysis

Version 12

Created by mark.proctor on Oct 11, 2006 10:02 PM. Last modified by tirelli on Dec 7, 2007 7:48 AM.

I've been looking over the esper docs to see the sort of syntax you need for event stream analsis and have the follow proposal.//

http://esper.codehaus.org/evaluating/tutorial/tutorial.html

First we need a schedular to say how often a rule executes. Each time the schedular kicks off it just asserts an Event object, similar to how a query works now with the QueryObject. There is also no reason why these constructs cannot be used in standard querries either to allow for manual calling.

"returns the average stock price for the last 30 seconds"

select avg(price) from StockTickEvent.win:time(30)

Currently drools uses a long counter for a fact id. We could have another alternative where that long represents time, default asserts get their time from System.currentTimeMillis() or user could specify the creation time of that fact. Users can also specify the termination time, if not specified it lives forever. We can then use that information to provide for reasoning that includes time. So we can extend the current accumulate support to achieve the above as follows:

$averagePirce : Integer() from accumulate( StockTickEvent($price).win.time(30)
init( int total= 0; int count= 0 )
action( total += $price )
result( total / count) )

"return the average stock price for the last 100 tickers groups by symbol"

select symbol, avg(price) as averagePrice from StockTickEvent.win:length(100) group by symbol

For this we will have to make a new bean:

class Group {
String name;
int total;
int count;
}

$averagePrice : Map() from accumulate( $tick : StockTickEvent($group, $price).win.time(30),
init( int total = 0; int count= 0; Map groups = new HashMap() )
action( groups[$group] += $price )
result( Map averages = new HashMap();
for( group : groups ) {
averages[group.name] = group.total / group.count;
} )

select fraud.accountNumber as accntNum, fraud.warning as warn, withdraw.amount as amount,
MAX(fraud.timestamp, withdraw.timestamp) as timestamp, 'withdrawlFraud' as desc
from FraudWarningEvent.win:time(1800) as fraud,
WithdrawalEvent.win:time(30) as withdraw
where fraud.accountNumber = withdraw.accountNumber

For this accumulate will need to be extended to allow multiple patterns, which it currently does not do.

list : List from accumulate( $f : FraudWarningEvent($accountNumber:accountNumber ).win:time(1800)
$w : WithdrawalEvent(accountNumber == accountNumber).win(30)
init( List list = new ArrayList() )
action( list.add( Object[] {$f, $w} )
result( list ) )

Looking at this there seems to be a number of common operations to sets of data. We should allow for user pluggeable functions that can be specified to handle this more declaratively. For instance we could implement an "avg" accumulate plugin as follows:

$averagePrice Integer() : from avg( StockTickEvent( $price : price ).win.length(100), $price )

The first parameter is the pattern and then each parameter after that is passed to the avg accumulate function. The accumulate function will need to implement methods to handle init, action, result. However the results should allow for a flexible and more powerul declarative system.

You'll notice that I focus on the "where" clause and not the result columns. Thats because the rows are created in the result(...) part, so a user can massage the results into any object they wish.

Window definition syntax

We discussed a few possible window definition syntaxes. It is important to note that a pattern without window definitions look like this:

[<var-bind> : ] <Pattern-Type>( [<constraints>] ) [from <source>]

<> : means must be replaced by the corresponding data
[] : means optional

We need to add window definition capabilities to patterns.

Option 1: StreamingSQL-like syntax:

[<var-bind> : ] <Pattern-Type>( [<constraints>] )[.<behavior>]* [from <source>]

e.g:
FraudWarningEvent($accountNumber:accountNumber ).win:time( 10 sec )
FraudWarningEvent($accountNumber:accountNumber ).win:time( 10 sec ).distinct

Advantages:

concise syntax
allows the combination of multiple behaviors
avoids parser keyword creation

Disadvantages:

we intended to use a similar syntax for field bindings
do not allow different connectors between behaviors

Option 2: English-friendly syntax

[<var-bind> : ] <Pattern-Type>( [<constraints>] ) [ with <behavior>[,<behavior>]*] [from <source>]

e.g:
FraudWarningEvent($accountNumber:accountNumber ) with win:time( 10 sec )
FraudWarningEvent($accountNumber:accountNumber ) with win:time( 10 sec ), distinct

Advantages:

make syntax more readable
allows the combination of multiple behaviors
demands the creation of only one keyword (we need to decide which keyword... the example above used "with")
allows future use of different connectors between behaviors

Disadvantages:

more verbose (!?)

Option 3: Behaviors inside the pattern definition

[<var-bind> : ] <Pattern-Type>( ["["<behavior>+"]",][<constraints>] ) [ with <behavior>[,<behavior>]*] [from <source>]

e.g:
FraudWarningEvent( [ win:time( 10 sec ) ], $accountNumber:accountNumber )
FraudWarningEvent( [ win:time( 10 sec ), distinct ], $accountNumber:accountNumber )

Advantages:

clear behavior scoping
allows the combination of multiple behaviors
avoids parser keyword creation
allows future use of different connectors between behaviors

Disadvantages:

possibly confusing syntax (!?)

Clock abstractions

What semantics should be assigned to the Clock abstraction?

At this point, the goal is to abstract the possible semantics through an interface and provide pluggable implementations for different semantics. Out of the box, we intent to provide:

System Clock: a clock that periodically synchronizes with the machine clock
Event based clock: a clock that is updated every time a given event arrives
Pseudo Clock: a generic implementation that is arbitrarily updated by application code
Event Attribute Clock: a clock that is updated by a configured event attribute

Temporal relational operators

Correlating events over time requires de definition of several time enabled relational operators. These time enabled operators are also dependent on the time semantics one choses for the implementation.

With our goal of having the most expressive rule language in the market, we decided to support interval based semantics for time reasoning, and this way, based on paper "Unified Semantics for Event Correlation Over Time and Space in Hybrid Network Environments" by Eiko Yoneki and Jean Bacon, we will support the following operators and their negations:

A before B : event A finishes before B starts
A after B : event A starts after B finishes
A equals B : event A starts and finishes at the same time as B
A meets B : event A finishes at the exact time B starts
A overlaps B : event A finishes after B starts but before B finishes
A starts B : event A start at the same time as B
A finishes B : event A finishes at the same time as B
A includes B : event A starts before or at the same time as B starts and finishes after or at the same time as B finishes

Follows a more formal specification, by Matthias:

All this goes back to Allen's interval-based semantics defined in 1983 in his paper "Maintaining Knowledge about temporal relationships". In it the 13 relationships are defined. In fact, we have the "equals" relation, plus 6 further relationships and their inverses (e.g., "before" is the inverse relationship of "after"). I wrote down how to formally interpret the visual representation for all 13 relationships given by Allen on page 4 (see the box at the upper right corner; the pictoral example illustrates best the situation) of the paper, i.e. I expressed them in the format you suggested using the start and end time of the events.

Here we go (of course, for all events E must hold: E.startTS <= E.endTS).

 
A equals B:             (A.startTS == B.startTS) && (A.endTS == B.endTS)
 
A before B:             (A.endTS < B.startTS)
A after B:              (A.startTS > B.endTS)
 
A meets B:              (A.endTS == B.startTS)
A met-by B:             (A.startTS == B.endTS)
 
A overlaps B:           (A.startTS < B.startTS) && (A.endTS < B.endTS)
A overlapped-by B:      (A.startTS > B.startTS) && (A.endTS > B.endTS)
 
A during B:             (A.startTS > B.startTS) && (A.endTS < B.endTS)
A contains B:           (A.startTS < B.startTS) && (A.endTS > B.endTS)
 
A starts B:             (A.startTS == B.startTS) && (A.endTS < B.endTS)
A started-by B:         (A.startTS == B.startTS) && (A.endTS > B.endTS)
 
A finishes B:           (A.startTS > B.startTS) && (A.endTS == B.endTS)
A finished-by B:        (A.startTS < B.startTS) && (A.endTS == B.endTS)

Referenced by:

The following languages have been recommended as alternative research points to Esper - AQuery and EasyLanguage.

JBossDeveloper

EventStreamAnalysis

Window definition syntax

Option 1: StreamingSQL-like syntax:

Advantages:

Disadvantages:

Option 2: English-friendly syntax

Advantages:

Disadvantages:

Option 3: Behaviors inside the pattern definition

Advantages:

Disadvantages:

Clock abstractions

Temporal relational operators

Comments