1 2 3 Previous Next 32 Replies Latest reply: Oct 29, 2009 11:59 AM by Emanuel Muckenhuber RSS

integration with the Papaki annotation indexer/repository

Scott Marlow Master

It is time to start looking at the integration of (Papaki) annotation indexer with the microcontainer. The purpose, as I understand, is to minimize the scanning of classes for annotations. Papaki will give us a repository that caches the annotations and a build time indexer for identifying which classes actually have annotations.

I would like to help measure the performance gain that we might see by integrating Papaki into the Microcontainer (with the understanding that Papaki may need to be tweaked/improved as we go down this path).

I will take a dive into the MC code, to see what I can learn about the possible integration. If someone that knows better, wants to just point out which five lines of code to change, that would be welcome! :-)

  • 1. Re: integration with the Papaki annotation indexer/repositor
    Ales Justin Master

     

    "smarlow@redhat.com" wrote:

    I will take a dive into the MC code, to see what I can learn about the possible integration. If someone that knows better, wants to just point out which five lines of code to change, that would be welcome! :-)

    You should simply do the following:

    * write a new POST_CL deployer (PapakiDeployer) that uses Papaki to read indexed annotations
    and creates AnnotationEnvironment (AE) for it (+ attaches it to DU)

    * make sure current deployer that creates AE doesn't run if AE already exists in DU
    (+ make sure this deployer runs *after* PapakiDeployer)

  • 2. Re: integration with the Papaki annotation indexer/repositor
    Emanuel Muckenhuber Master

    I guess it could also make sense to see if we can reuse this annotation index in MDR as well?

    Beside that we would need a better programmatic API for excluding resources from scanning. I think this is actually the most important piece we have to do, as not doing any annotation scanning at all would most probably be faster than indexing :)

    So we would need to create a ScanningMetaData when e.g. JBossEjbMetaData.isMetaDataComplete(). This then would exclude all non EE5 deployments from annotation scanning without a separate jboss-scanning.xml. As far as i can remember for servlet 3.0 we would need be able to exclude resources on .jar basis as well.

  • 3. Re: integration with the Papaki annotation indexer/repositor
    Ales Justin Master

     

    "emuckenhuber" wrote:
    I guess it could also make sense to see if we can reuse this annotation index in MDR as well?

    Yup.
    I guess AE really belongs into Papaki. At least the spi/api/interfaces.
    (and afair my code, it doesn't have any Deployers dependencies)

    "emuckenhuber" wrote:

    Beside that we would need a better programmatic API for excluding resources from scanning. I think this is actually the most important piece we have to do, as not doing any annotation scanning at all would most probably be faster than indexing :)

    So we would need to create a ScanningMetaData when e.g. JBossEjbMetaData.isMetaDataComplete(). This then would exclude all non EE5 deployments from annotation scanning without a separate jboss-scanning.xml. As far as i can remember for servlet 3.0 we would need be able to exclude resources on .jar basis as well.

    This should all be a part of ScanningMetaData - some helper class; e.g. like BMDBuilder --> SMDBuilder.
    It would then be trivial to build ScanningMetaData and place it into DU.

    And then PapakiDeployer should take ScanningMetaData into an account while building AE.

  • 4. Re: integration with the Papaki annotation indexer/repositor
    Emanuel Muckenhuber Master

     

    "alesj" wrote:
    "emuckenhuber" wrote:
    I guess it could also make sense to see if we can reuse this annotation index in MDR as well?

    Yup.
    I guess AE really belongs into Papaki. At least the spi/api/interfaces.
    (and afair my code, it doesn't have any Deployers dependencies)


    It most probably should be part of the AS integration, as we only add it to the DU we don't need the deployers to depend on it as well.

    "alesj" wrote:

    "emuckenhuber" wrote:

    Beside that we would need a better programmatic API for excluding resources from scanning. I think this is actually the most important piece we have to do, as not doing any annotation scanning at all would most probably be faster than indexing :)

    So we would need to create a ScanningMetaData when e.g. JBossEjbMetaData.isMetaDataComplete(). This then would exclude all non EE5 deployments from annotation scanning without a separate jboss-scanning.xml. As far as i can remember for servlet 3.0 we would need be able to exclude resources on .jar basis as well.

    This should all be a part of ScanningMetaData - some helper class; e.g. like BMDBuilder --> SMDBuilder.
    It would then be trivial to build ScanningMetaData and place it into DU.

    And then PapakiDeployer should take ScanningMetaData into an account while building AE.


    Yeah something simple like that should be enough.

    We could also add some simple deployers to the 5_x branch - at least generating a empty ScanningMetaData for isMetaDataComplete() does not require Papaki. This could speed up legacy deployments quite a bit, as we wouldn't process the already scanned annotations anyway.

  • 5. Re: integration with the Papaki annotation indexer/repositor
    Ales Justin Master

     

    "emuckenhuber" wrote:

    It most probably should be part of the AS integration, as we only add it to the DU we don't need the deployers to depend on it as well.

    Yes, sounds even better.
    e.g. other AS components could just rely on it, w/o knowing anything about Papaki.

    "emuckenhuber" wrote:

    We could also add some simple deployers to the 5_x branch - at least generating a empty ScanningMetaData for isMetaDataComplete() does not require Papaki. This could speed up legacy deployments quite a bit, as we wouldn't process the already scanned annotations anyway.

    I think ScottM already did something similar.


  • 6. Re: integration with the Papaki annotation indexer/repositor
    Scott Marlow Master

     

    "alesj" wrote:

    You should simply do the following:

    * write a new POST_CL deployer (PapakiDeployer) that uses Papaki to read indexed annotations
    and creates AnnotationEnvironment (AE) for it (+ attaches it to DU)

    * make sure current deployer that creates AE doesn't run if AE already exists in DU
    (+ make sure this deployer runs *after* PapakiDeployer)


    Is this a quick answer for performance testing only or a suggestion of how to do the actual integration with Papiki?

    How will this impact jboss-mdr and the merging of metadata in deployment xml with the class annotations?

  • 7. Re: integration with the Papaki annotation indexer/repositor
    Ales Justin Master

     

    "smarlow@redhat.com" wrote:

    Is this a quick answer for performance testing only or a suggestion of how to do the actual integration with Papiki?

    The actual integration.

    "smarlow@redhat.com" wrote:

    How will this impact jboss-mdr and the merging of metadata in deployment xml with the class annotations?

    The annotations from xml have little to do with this.
    That's just a matter of putting them under proper scope in MDR.

    MDR and Papaki issue is how to properly push Papaki's info into MDR.
    Once we have Papaki info --> AnnotationEnv, is this good enough for MDR or do we need some more stuff, ...

    I guess what we now have under MergingAnnDeployer should be replaced with MDR population,
    and from then on, every external annotation user should check the right MDR's MetaData instance.


  • 8. Re: integration with the Papaki annotation indexer/repositor
    Jesper Pedersen Master

    Ales, can you describe the use-case for JBANN-43 ? In Papaki you explicit specifies which URL resources you want scanned - each URL can be a file or a directory. Filtering is doing through the Papaki specific metadata.

    For JBANN-44 I think it should be a switch - default to false - as most projects doesn't care about annotation definitions, but rather where a specific annotation is located - e.g. which classes, methods, ...

    JBANN-45 needs to be handled through plugins for the various methods. Currently we have the exclude / excludeAll parameters for the indexer.

    JBANN-46 - Papaki is a standalone library with no dependencies on specific vendor APIs. Adding support for the vfszip: protocol could be done through reflection in order to not create a dependency.

  • 9. Re: integration with the Papaki annotation indexer/repositor
    jaikiran pai Master

     

    "jesper.pedersen" wrote:

    Adding support for the vfszip: protocol could be done through reflection in order to not create a dependency.


    My understanding was that with the new VFS3, there would no longer be custom protocols like vfszip. Am i wrong?


  • 10. Re: integration with the Papaki annotation indexer/repositor
    Ales Justin Master

     

    "jesper.pedersen" wrote:
    Ales, can you describe the use-case for JBANN-43 ? In Papaki you explicit specifies which URL resources you want scanned - each URL can be a file or a directory. Filtering is doing through the Papaki specific metadata.

    The use case is the example from JIRA issue.

    e.g.
    All our restrictions are held in Module, but that's mostly impl detail.
    The way we expose that is via proper visitor pattern.

    e.g.
    A user can provide ScanningMetaData on-the-fly,
    and we should take that in account when:
    * re-creating old serialized AR
    * creating AR at runtime

    Not to mention resources lookup is *the* use-case where you need to apply visitor pattern.

    "jesper.pedersen" wrote:

    For JBANN-44 I think it should be a switch - default to false - as most projects doesn't care about annotation definitions, but rather where a specific annotation is located - e.g. which classes, methods, ...

    OK, this could be made optional.

    "jesper.pedersen" wrote:

    JBANN-45 needs to be handled through plugins for the various methods. Currently we have the exclude / excludeAll parameters for the indexer.

    That's too simplistic.
    You should have a proper abstraction.

    "jesper.pedersen" wrote:

    JBANN-46 - Papaki is a standalone library with no dependencies on specific vendor APIs. Adding support for the vfszip: protocol could be done through reflection in order to not create a dependency.

    I don't like to say this, but I'll still go ahead,
    Papaki as it's currently in the trunk, is complete re-invent of the wheel.
    Standalone argument doesn't convince me.

    And I spent the whole weekend looking at it and thinking about it how to best integrate it with Deployers,
    only to realize that complete re-write is what we should do.

    Here is my version of it, which can be quite easily integrated with Deployers,
    plus it uses all of the abstraction that are already under JBoss umbrella:
    * Reflect -- abstraction between JDK Introspection and Javassist
    * MDR - simplifying lookup via Signature
    * VFS - resources lookup abstraction and visitor pattern
    * ClassLoading - exact class resources visitor pattern

    Location: http://anonsvn.jboss.org/repos/jbossas/projects/annotations/branches/AnnEnv/

  • 11. Re: integration with the Papaki annotation indexer/repositor
    Jesper Pedersen Master

     


    The use case is the example from JIRA issue.

    e.g.
    All our restrictions are held in Module, but that's mostly impl detail.
    The way we expose that is via proper visitor pattern.

    e.g.
    A user can provide ScanningMetaData on-the-fly,
    and we should take that in account when:
    * re-creating old serialized AR
    * creating AR at runtime

    Not to mention resources lookup is *the* use-case where you need to apply visitor pattern.


    That is a finer grained API than the current API - and some of it will be in a SPI. But this also implies a more dynamic annotation model - which is maybe something for a 1.1 release based on feedback from the community.


    That's too simplistic.
    You should have a proper abstraction.


    Creating the needed metadata is an assembly time operation - so it only need to be done once. And yes, supporting additional methods of reading existing metadata could benefit the indexer.


    I don't like to say this, but I'll still go ahead,
    Papaki as it's currently in the trunk, is complete re-invent of the wheel.
    Standalone argument doesn't convince me.

    And I spent the whole weekend looking at it and thinking about it how to best integrate it with Deployers,
    only to realize that complete re-write is what we should do.

    Here is my version of it, which can be quite easily integrated with Deployers,
    plus it uses all of the abstraction that are already under JBoss umbrella:
    * Reflect -- abstraction between JDK Introspection and Javassist
    * MDR - simplifying lookup via Signature
    * VFS - resources lookup abstraction and visitor pattern
    * ClassLoading - exact class resources visitor pattern

    Location: http://anonsvn.jboss.org/repos/jbossas/projects/annotations/branches/AnnEnv/


    The goal of Papaki is to provide a simple API for developers and thereby being able to deploy the library in any environment (like one of the current users on WebSphere).

    If the current API and scope of the project doesn't fit into the MC project I welcome a fork of the code into a new project where we can share ideas across the projects.

  • 12. Re: integration with the Papaki annotation indexer/repositor
    David Lloyd Master

    So - we are taking a small project, adding dependencies and making it more complex?

    Papaki should actually be more minimal than it is. The indexes should be built by directly examining the file. Think about it - possibly every class in every JAR that is ever deployed may end up being read by this. There is absolutely no way that reflection is going to cut it performance-wise, nor do I think Javassist will suffice.

    The two key requirements for Papaki are: speed of execution and compactness/efficiency of indexing. If it doesn't hit both of these out of the park, we're going to be taking a step backwards in terms of performance. I don't see why it's anything more than a highly string-optimized map implementation (patricia trie perhaps) plus a tuned byte scanner.

    Adding dependencies is a bad idea. Projects should as a rule have the minimum number of runtime dependencies possible. Having lots of dependencies just makes it harder to refactor later. If you have a large web of dependencies between a set of projects, that's a big indicator that they ought to be one big project anyway, or at least split up differently.

  • 13. Re: integration with the Papaki annotation indexer/repositor
    Jesper Pedersen Master

     


    Papaki should actually be more minimal than it is. The indexes should be built by directly examining the file. Think about it - possibly every class in every JAR that is ever deployed may end up being read by this. There is absolutely no way that reflection is going to cut it performance-wise, nor do I think Javassist will suffice.


    The indexer in Papaki builds an index over where annotations are located within the .class or .jar file. If the indexer is present, but is empty the framework will skip the entire annotation parsing stage. If an index is present, but contains annotation metadata only the part of the .class file where the annotation is located is scanned.

    The annotations themself can't be located in the metadata as they are not serializable, but the all the metadata about them can.

    Furthermore the scanner can be configured to only scan the part of the classes you are interested in - f.ex. if an annotation only can be located on a public class - at class level - then all non-public classes, all fields, all constructor, all methods and all parameters are skipped.

    HTH

  • 14. Re: integration with the Papaki annotation indexer/repositor
    David Lloyd Master

    But I thought the idea was that we don't know what annotations we're interested in at the time the classes are scanned. Isn't that the whole point? So we have to index all of them (unless there is some hint from the user that scanning is not necessary for a class or package or whatever) and remember (sans-classloading) where annotations are located, and the relationships between classes, so that we can find all classes and subclasses with certain annotations without actually loading any classes.

1 2 3 Previous Next