Plans for Hibernate Search 4

 

Hibernate 4 is the release where the following should happen:

  • split packages into API, SPI and private packages
  • use JBoss Logging
  • be compliant with Core 4
  • break whatever contract we need to break to open up the future
  • split dependency between the core of Hibernate Search and Hibernate Core - HSEARCH-677
  • review architecture and configuration inconsistrencies

 

Do you see more task for 4?

 

Changing contracts

We have had a few contracts that we wanted to change to make way for future improvements:

  • should a bridge know about the field it changes (make the optimization more efficient)
  • API change with backend communication HSEARCH-757 - DONE
    • don't send Document to better control serialization - related to HSEARCH-681 - DONE
      • design a Document surrogate to keep flexibility
    • send "update" operation instead of "delete + add" - DONE
      • no need to actually send "update" verbs, but make sure the backend knows how to deal with "update" - DONE
  • DirectoryProvider - HSEARCH-758  - DONE (90% via HSEARCH-750 - see Sanne)
    • make an "IndexManager" instead, which is able to provide factories for both IndexReader an IndexWriters - DONE
    • add utility methods like "getName()" - DONE
      • (wish I had that in some cases to provide better error messages)
    • Instead of trying to foresee all needed methods, the extension point should not be the IndexManager interface directly, but have people plug in different aspects.
    • This is needed to eventually support:
      • Instantiated indexes - HSEARCH-336
      • make good use of all new so called "Near-Real-Time" Lucene improvements by having IndexReader and IndexWriter possibly generated by the same service - HSEARCH-759 - DONE
      • reuse the JGroups channel used by the Infinispan Directory as transport for a new backend (again, the two components need to interact) - HSEARCH-882
  • ReaderProvider - HSEARCH-760 - DONE
    • redesign the interface to provide a Reader for a single index, not a MultiReader
        • useful to simplify resource tracking
        • make it easy (possible) to cleverly integrate per-index caches with it's Reader lifecycle
  • Backends and Workers scope - HSEARCH-761  - DONE
    • Make it possible to configure different backends per index (there can currently be only one backend
      • tune different backends for different purposes
      • make it possible to have sync / async on different entities/indexes.
      • use different clustering/replication options or even technologies per index
  • Current defaults to change:
    • default to NumericFields for numeric properties - HSEARCH-763
    • remove the different notion of batch / transactional IndexWriter configuration setting - HSEARCH-743 - DONE
      • assume exclusive_index_use=true as default - HSEARCH-762 - DONE
  • Mapping changes
    • current DocumentBuilder builds the target Document considering mostly what was inferred from the static annotations, this misses:
      • relations to types which at runtime are actually subclasses instead of annotated type
      • maximum depth calculation is "different", as it's currently based on the model, not on the data depth (different during cycles)
      • no need to throw an exception to get out of a circularity
      • org.hibernate.search.util.ScopedAnalyzer logic will likely need to be re-thought; in the end the field-names / analyzer mapping will likely still be static, or we should be able to detect possible inconsistencies and report them - OR support them and store in the metadata which analyzers are to be applied per field on each Work instance.
    • Bridges
      • as we'll not be able to expose org.apache.lucene.document.Document instances directly to user-defined-bridges because of the different format to be sent to the backend, all bridges will need to write to the alternative container instead. - This turned out to not be true but do we want to shield people from Document anyways?
  • ClassBridge and DynamicBoost
    • would be nice if they could (optionally?) report back on which fields they are to be considered "dirty", i.e. which fields from the values they are being passed they are actually interested in
      • not likely to be used in Hibernate Search 4.0 but would make all sort of optimisations possible later on

 

 

Can you help collect the list of changes you would like to see happening?

 

I would like to get this work started asap, this is really the unknown quantity and we tend to be slow to converge on the things

 

Split packages in API/SPI/private packages - DONE

Hibernate 4 is the ideal time to properly split stuff into API, SPI, private. Moving classes to private packages is the least impacting move for users as these should not be used. The API / SPI split is sometimes difficult to do so if you have a doubt in an area, ask on the ML or on IRC and we can discuss it together. If you need an example, check out the query engine. It is relatively clean now.

 

We might have to break a few user APIs which is fine but I don't expect too many will be necessary:

  • make sure to discuss it when you plan to do one
  • list them in the migration guide

 

I'd say that the package splitting should be done when you have a change and when you work in a specific area. It's more a background task.

 

Be compliant with Core 4 - DONE

We can do this one a bit later in the cycle to give time for core to mature.

 

Split dependency between Hibernate Search and Hibernate Core

I think in practice we are not too far. This work should be done in parallel to the package splitting. If you look at the query engine, we do have specific hibernate packages. We also have a HibernateHelper class of all low level Hibernate contracts like unproxying, initializing etc. We should use that class everywhere instead of relying on the direct Hibernate Core contracts. That will help up to move this class as an implementable contract.

 

The next step potentially is to actually move Hibernate Core specific code into a separate package. I don't have much opinion on this but we should definitively discuss it.

 

Use JBoss Logging - DONE

Done.

 

Review architecture and configuration

 

There are several inconsistencies in how we describe and configure Search. One example is the term 'backend' which we treat like a single configuration option, but really it is a more general concept which actually requires changing several options in union. From IRC:

 

     1:51pm hardy: the term 'backend'
      1:54pm hardy: we have a directory provider and reader strategy, etc
      1:55pm sannegrinovero: suggestion?
      1:55pm hardy: these are single components you can configure. basically implementation classes of a specific interface
      1:55pm hardy: 'backend' is somehow different
      1:56pm hardy: there is hibernate.search.<indexName>.worker.backend, but choosing a backend is more
      1:56pm hardy: in most cases just changing the hibernate.search.<indexName>.worker.backend property does not make sense
      1:56pm sannegrinovero: well it's not always true that you can configure a different reader strategy either, some IndexManagers? might hardcode a specific impl.
      1:56pm hardy: you have to change other values as well, like the directory provider or the groups config, etc
      1:57pm hardy: sure
      1:57pm hardy: i am not saying we have to change things and I don't have a concrete suggestion for a change
      1:58pm hardy: i just want to ball these thoughts to see whether others (you) also agree that there is a potential mismatch
      1:58pm sannegrinovero: well that's where HSEARCH-791 will kick in, since to make use of Infinispan you'll have to select specific configuration details of other services too, it will make it easier hardcoding the only valid option.
      1:58pm jbossbot: jira HSEARCH-791 Make it easier to setup Infinispan with partially generated dynamic configurations Open (Unresolved) New Feature, Major, Sanne Grinovero https://hibernate.onjira.com/browse/HSEARCH-791
      2:00pm sannegrinovero: so in that case you'll select "infinispan indexmanage" or wathever the name will be as the IndexManager?, and you won't need to specify a backend or a readerprovider, as it will bind you to the specific ones.
      2:00pm hardy: i like this idea
      2:00pm sannegrinovero:
      2:00pm sannegrinovero: I've long considered that configuring all the moving parts for clustering is far too hard
      2:01pm sannegrinovero: and it should be possible with a single configuration line
      2:01pm hardy: right, I think that's what I mean
      2:01pm sannegrinovero: so that's what I'm aiming for, but this issue won't be solved before we can go to Infinispan 5.1
      2:01pm hardy: i see
      2:01pm sannegrinovero: as I had found several blockers on the road
      2:02pm hardy: of course it should not only target infinispan, but also jms
      2:02pm hardy: etc
      2:02pm sannegrinovero: right, we could make one already to integreate the JMS approach
      2:02pm hardy: or jgroups
      2:03pm sannegrinovero: yes, but I'd do JMS first as JGRoups is more interesting in combination with Infinispan
      2:03pm hardy: i see
      2:03pm sannegrinovero: I won't expect people to usee JGroups over JMS with the master/slave approach
      2:03pm hardy: no?
      2:03pm hardy: maybe not
      2:04pm sannegrinovero: well it's possible, but it lacks many guarantees
      2:04pm sannegrinovero: you'll never be sure if the packets are lost
      2:04pm sannegrinovero: it supports ACK and orderding, but if the master is down it won't persist them.
      2:05pm sannegrinovero: So getting back to explaining the "backend" concept:
      2:05pm hardy: ok
      2:05pm sannegrinovero: I'd focus that to power users who want to write their own *backend* only and reuse one of the base IndexManager? to wire all pieces together
      2:06pm sannegrinovero: So in this light it might still make sense to re-vamp the name, but is it worth it? it also affects some class names and lots of comments in the code.
      2:07pm hardy: maybe we should add a note along the lines "backend a term describing a higher level configuration and switching a backend requires to configure multiple properties. refer to XYZ"
      2:08pm hardy: as you say, probably best to keep things for now, but it seems you also see the "problem" and even thought already about some improvements
      2:10pm sannegrinovero: yes but the improvements I've thought about unfortunately won't make it for 4.0 it seems
      2:11pm sannegrinovero: I'd say don't spend too much time on this docs area, I hope we'll revive the backend stuff in 4.1
      2:11pm hardy: sure
      2:11pm sannegrinovero: hopefully at that point we won't even need to document this property
      2:11pm hardy: it's a shame though, since 4.0 should have been the version where we should have worked this out
      2:12pm sannegrinovero: right. tbh I didn't think about doing it on the JMS thing first, that would have been a nice to have.
      2:12pm sannegrinovero: maybe still time for it? I won't have time
      2:13pm hardy: was the latest with the release anyways?
      2:13pm hardy: we need to wait for Core right?
      2:14pm hardy: did you guys decide on a date yesterday? next week?
      2:15pm sannegrinovero: no decisions made, yes waiting on core.

New features

Do you want any new feature per se? I think this would be a great time to get the community involved to back new features and fix bugs while we do the grunt work for 4. So if you know some shy people motivated or if you are one of them, stand up