Version 12

    I've been looking over the esper docs to see the sort of syntax you need for event stream analsis and have the follow proposal.//

    http://esper.codehaus.org/evaluating/tutorial/tutorial.html

     

    First we need a schedular to say how often a rule executes. Each time the schedular kicks off it just asserts an Event object, similar to how a query works now with the QueryObject. There is also no reason why these constructs cannot be used in standard querries either to allow for manual calling.

     

    "returns the average stock price for the last 30 seconds"

    select avg(price) from StockTickEvent.win:time(30)
    

     

    Currently drools uses a long counter for a fact id. We could have another alternative where that long represents time, default asserts get their time from System.currentTimeMillis() or user could specify the creation time of that fact. Users can also specify the termination time, if not specified it lives forever. We can then use that information to provide for reasoning that includes time. So we can extend the current accumulate support to achieve the above as follows:

    $averagePirce : Integer() from accumulate( StockTickEvent($price).win.time(30)
    init( int total= 0; int count= 0 )
    action( total += $price )
    result( total / count) )
    

    "return the average stock price for the last 100 tickers groups by symbol"

    select symbol, avg(price) as averagePrice from StockTickEvent.win:length(100) group by symbol

     

    For this we will have to make a new bean:

    class Group {
    String name;
    int total;
    int count;
    }
    

     

    $averagePrice : Map() from accumulate( $tick : StockTickEvent($group, $price).win.time(30),
    init( int total = 0; int count= 0; Map groups = new HashMap() )
    action( groups[$group] += $price )
    result( Map averages = new HashMap();
    for( group : groups ) {
    averages[group.name] = group.total / group.count;
    } )
    

     

    select fraud.accountNumber as accntNum, fraud.warning as warn, withdraw.amount as amount,
    MAX(fraud.timestamp, withdraw.timestamp) as timestamp, 'withdrawlFraud' as desc
    from FraudWarningEvent.win:time(1800) as fraud,
    WithdrawalEvent.win:time(30) as withdraw
    where fraud.accountNumber = withdraw.accountNumber
    

     

    For this accumulate will need to be extended to allow multiple patterns, which it currently does not do.

    list : List from accumulate( $f : FraudWarningEvent($accountNumber:accountNumber ).win:time(1800)
    $w : WithdrawalEvent(accountNumber == accountNumber).win(30)
    init( List list = new ArrayList() )
    action( list.add( Object[] {$f, $w} )
    result( list ) )
    

    Looking at this there seems to be a number of common operations to sets of data. We should allow for user pluggeable functions that can be specified to handle this more declaratively. For instance we could implement an "avg" accumulate plugin as follows:

    $averagePrice Integer() : from avg( StockTickEvent( $price : price ).win.length(100), $price )
    

    The first parameter is the pattern and then each parameter after that is passed to the avg accumulate function. The accumulate function will need to implement methods to handle init, action, result. However the results should allow for a flexible and more powerul declarative system.

     

    You'll notice that I focus on the "where" clause and not the result columns. Thats because the rows are created in the result(...) part, so a user can massage the results into any object they wish.

     

    Window definition syntax

     

    We discussed a few possible window definition syntaxes. It is important to note that a pattern without window definitions look like this:

    [<var-bind> : ] <Pattern-Type>( [<constraints>] ) [from <source>]
    
    <> : means must be replaced by the corresponding data
    [] : means optional
    

     

    We need to add window definition capabilities to patterns.

     

    Option 1: StreamingSQL-like syntax:

     

    [<var-bind> : ] <Pattern-Type>( [<constraints>] )[.<behavior>]* [from <source>]
    
    e.g:
    FraudWarningEvent($accountNumber:accountNumber ).win:time( 10 sec )
    FraudWarningEvent($accountNumber:accountNumber ).win:time( 10 sec ).distinct
    

     

    Advantages:

    • concise syntax

    • allows the combination of multiple behaviors

    • avoids parser keyword creation

     

    Disadvantages:

    • we intended to use a similar syntax for field bindings

    • do not allow different connectors between behaviors

     

    Option 2: English-friendly syntax

    [<var-bind> : ] <Pattern-Type>( [<constraints>] ) [ with <behavior>[,<behavior>]*] [from <source>]
    
    e.g:
    FraudWarningEvent($accountNumber:accountNumber ) with win:time( 10 sec )
    FraudWarningEvent($accountNumber:accountNumber ) with win:time( 10 sec ), distinct
    

     

    Advantages:

    • make syntax more readable

    • allows the combination of multiple behaviors

    • demands the creation of only one keyword (we need to decide which keyword... the example above used "with")

    • allows future use of different connectors between behaviors

     

    Disadvantages:

    • more verbose (!?)

     

    Option 3: Behaviors inside the pattern definition

    [<var-bind> : ] <Pattern-Type>( ["["<behavior>+"]",][<constraints>] ) [ with <behavior>[,<behavior>]*] [from <source>]
    
    e.g:
    FraudWarningEvent( [ win:time( 10 sec ) ], $accountNumber:accountNumber )
    FraudWarningEvent( [ win:time( 10 sec ), distinct ], $accountNumber:accountNumber )
    

     

    Advantages:

    • clear behavior scoping

    • allows the combination of multiple behaviors

    • avoids parser keyword creation

    • allows future use of different connectors between behaviors

     

    Disadvantages:

    • possibly confusing syntax (!?)

     

    Clock abstractions

     

    What semantics should be assigned to the Clock abstraction?

    At this point, the goal is to abstract the possible semantics through an interface and provide pluggable implementations for different semantics. Out of the box, we intent to provide:

    • System Clock: a clock that periodically synchronizes with the machine clock

    • Event based clock: a clock that is updated every time a given event arrives

    • Pseudo Clock: a generic implementation that is arbitrarily updated by application code

    • Event Attribute Clock: a clock that is updated by a configured event attribute

     

    Temporal relational operators

     

    Correlating events over time requires de definition of several time enabled relational operators. These time enabled operators are also dependent on the time semantics one choses for the implementation.

    With our goal of having the most expressive rule language in the market, we decided to support interval based semantics for time reasoning, and this way, based on paper "Unified Semantics for Event Correlation Over Time and Space in Hybrid Network Environments" by Eiko Yoneki and Jean Bacon, we will support the following operators and their negations:

     

    • A before B : event A finishes before B starts

    • A after B : event A starts after B finishes

    • A equals B : event A starts and finishes at the same time as B

    • A meets B : event A finishes at the exact time B starts

    • A overlaps B : event A finishes after B starts but before B finishes

    • A starts B : event A start at the same time as B

    • A finishes B : event A finishes at the same time as B

    • A includes B : event A starts before or at the same time as B starts and finishes after or at the same time as B finishes

     

    Follows a more formal specification, by Matthias:

     

    All this goes back to Allen's interval-based semantics defined in 1983 in his paper "Maintaining Knowledge about temporal relationships". In it the 13 relationships are defined. In fact, we have the "equals" relation, plus 6 further relationships and their inverses (e.g., "before" is the inverse relationship of "after"). I wrote down how to formally interpret the visual representation for all 13 relationships given by Allen on page 4 (see the box at the upper right corner; the pictoral example illustrates best the situation) of the paper, i.e. I expressed them in the format you suggested using the start and end time of the events.

    Here we go (of course, for all events E must hold: E.startTS <= E.endTS).

     

     
    A equals B:             (A.startTS == B.startTS) && (A.endTS == B.endTS)
     
    A before B:             (A.endTS < B.startTS)
    A after B:              (A.startTS > B.endTS)
     
    A meets B:              (A.endTS == B.startTS)
    A met-by B:             (A.startTS == B.endTS)
     
    A overlaps B:           (A.startTS < B.startTS) && (A.endTS < B.endTS)
    A overlapped-by B:      (A.startTS > B.startTS) && (A.endTS > B.endTS)
     
    A during B:             (A.startTS > B.startTS) && (A.endTS < B.endTS)
    A contains B:           (A.startTS < B.startTS) && (A.endTS > B.endTS)
     
    A starts B:             (A.startTS == B.startTS) && (A.endTS < B.endTS)
    A started-by B:         (A.startTS == B.startTS) && (A.endTS > B.endTS)
     
    A finishes B:           (A.startTS > B.startTS) && (A.endTS == B.endTS)
    A finished-by B:        (A.startTS < B.startTS) && (A.endTS == B.endTS)
    

     

     

    Referenced by:

     

     

     

    The following languages have been recommended as alternative research points to Esper - AQuery and EasyLanguage.