Quantcast
Channel: ATeam Chronicles »» Endeca
Viewing all 10 articles
Browse latest View live

Some Tips for Configuring Endeca for Optimal Production Performance with ATG

$
0
0

Introduction

The following tips will ensure that you are achieving optimum performance from your production Oracle Commerce Endeca environment.

Main Article

Make sure Preview isn’t enabled in production

/dyn/admin/nucleus/atg/endeca/assembler/cartridge/manager/AssemblerSettings/ : Make sure previewEnabled = false

Setting this to true will return back too much merch rule data with the results. You can confirm this by looking at the Dgraph.reqlog, and looking for merchdebug=true in the queries.

Make sure you have the latest Assembler patches

Ensure that you have the following patches applied. (These are included in version 11 of Oracle Commerce, so no patching is required on that release.)

  • Patch 17342677 : This includes 2 fixes. First, it, reduces the number of supplemental objects that are returned with the queries, which helps performance. This is for ATG 10.1.2 and 10.2, but really it’s for Assembler 3.1.1 and 3.1.2.
  • Second, it fixes an XML Parser locking problem that can be seen during performance testing.
  • To verify that the patch was correctly installed, make sure the request logs start including the following with the requests: &merchrulefilter=endeca.internal.nonexistent

This patch should be considered critical and always applied.

Check the properties being returned by Endeca

In Assembler, you can select which properties are returned back with the search results.

If this is not configured, then all of the properties that are configured by Developer Studio will be returned (and this can be large).

If you have incorrectly spelled the names of properties, then an error message will be written to the Dgraph.log. This can cause a slight IO hit by the MDEX, and fill up that log very quickly.

This (by default) can be found at: /atg/endeca/assembler/cartridge/handler/config/ResultsListConfig.properties (by default, this is in /Store/Endeca/Assembler/config/config.jar)

Make sure your Assembler logging is configured correctly

Assembler will create an Endeca LogEntry for each page view and send that message to the Endeca logging server. This happens asynchronously, and the Endeca logging API will spawn a separate thread to handle the communication with the logging server.

If this is configured incorrectly, that separate thread will spawn, try, wait and fail to connect to the logging server. It will eventually go away, but they will build up in memory. If you are doing heap dumps and see problems for com.endeca.logging.LogEntry or LogConnection, this is probably the problem.

For Assembler, you would configure this as an assemblerEventListener on the NucleusAssemblerFactory configuration ( /atg/endeca/assembler/NucleusAssemblerFactory in /dyn/admin ).

Remove any comments from your Experience Manager cartridges and templates

In the XML files that you create for your cartridges and templates in Experience Manager, remove all of the XML comments. Especially remove any Oracle disclaimer text from the top. These comments are returned in the response back from the MDEX Engine and will increase the response size. (In the Discover Electronics application, this text will increase the response size by around 10%).

NOTE: The Experience Manager tool (XM) does not add these “copyright” comments. Any content item (or page) that is edited and saved by XM will not have any comments, so any page or content saved by the tool will not have this issue. The XML comments are only introduced to the system by importing initial content for Discover Electronics. The XML comments were added by hand outside of the tool, and they were only added to sample configuration files. Hopefully, you won’t run your application with this sample configuration.

Check your network latency

Depending on the number of cartridges (and the type of those cartridges) ATG might have to make a number of calls to the MDEX Engine. If the network isn’t fast, this can introduce latency into the application even if the MDEX Engine has to do very little processing.

Make sure the load balancer is working correctly and that the network connections are as fast as possible. If possible, use a network packet sniffer or a tool like wireshark to trace the round trip packets from ATG to your MDEX instances so that you can verify that the network incurred latency has been minimized.

Check your URL Patterns to ensure non-Endeca URLs don’t hit Assembler

http://docs.oracle.com/cd/E41069_01/Platform.11-0/ATGEndecaIntegrationGuide/html/s0706assemblerpipelineservlet01.html

In ATG, there is an AssemblerPipelineServlet. This will read in URLs and process them through Assembler. If it’s not correctly configured, too many URLs might flow through Assembler for no reason. We’ve seen this at several customer sites, where their ATG application was serving up the product images (as opposed to them being hosted externally at Akami or something). They had their URLs incorrectly configured, and for every image being served up, a useless call was being made to Assembler.

The “/atg/endeca/assembler/AssemblerPipelineServlet.ignoreRequestURIPattern” is the property to configure.

Records Per Aggregate Record set to all

Quick background: In Endeca, you can have SKU-level records that are aggregated together into an Aggregate record. You can choose how to bring back the SKU level information. Either 0 SKU’s, 1, or all. By default, this setting is set to bring back All SKU’s in an aggregate.

This isn’t necessary and creates a larger response sizes from Endeca and uses more memory in ATG.

This can be configured in the ResultsListHandler.

All Dimensions Compute Refinement Counts

By default, all dimensions generated via ATG schema are set to compute refinement statistics. You should turn this off if the site does not display them. This can be done by adding the dimensions to the schema.csv in the config/api_input directory and setting the attribute.dimension.compute_refinement_counts to false. You can also globally disable the refinement counts by setting the refineShowRecordCounts property to false in the dimensionTemplate bean in the fcm.context.xml file.

This will save processing time in the MDEX Engine, but won’t really affect response sizes or memory usage by ATG.

Ensure that custom handlers are scoped as Prototype

Incorrectly scoped handlers can cause ConcurrentModificationExceptions. If you should attempt to create/configure an ATG Nucleus Component based on an Endeca class/bean, you will need to set the $scope for that Component to prototype. The Assembler object, created via /atg/endeca/assembler/AssemblerTools, retrieves Objects from ECR|IFCR|Spring. With scope set to global, Assembler is essentially retrieving the specific Object from Spring, which begins to explain why multiple threads are sharing the same object. With scope set to prototype, Assembler is retrieving a copy of the Object or an iteration of the Object, which is not shared across multiple threads. EXAMPLE

./localconfig/foobar/endeca/assembler/custom/cartridge/handler/config/ContentSlotListConfig.properties
$class=com.endeca.infront.cartridge.ContentSlotListConfig
$scope=prototype

To clarify a little, the out-of-the-box Commerce Reference Store cartridge handler default configuration components happen to function correctly when configured with global scope because their cartridge handlers don’t attempt to modify them in any way. This is not true of the ContentSlotHandler (or of the cartridge handler contract in general). Thus the recommendation is to always use prototype scope for default configuration objects. Doing so, although conservative for some handlers, makes it easy to ensure correctness. The sample Spring files supplied with the Endeca v11.0.0 release conforms to this recommendation.


Sites Asset Modeling when integrating with Endeca

$
0
0

Introduction

The combination of Sites + Endeca is a good fit as the products complement each other nicely with little overlap. However, there is no current best practice for what kinds of modeling changes should be considered (if any) when the two products are integrated. Since Endeca has exceptionally robust tools to precisely control denormalization, a best practice might be to consider leveraging Endeca to perform denormalization and not use the built-in Sites flex parent mechanism.

Main Article

Denormalization is a key ingredient to delivering performant websites. Sites was one of the first CMS’s to offer OOTB denormalization via its flex parent feature. While flex parents are generally considered key feature of the product, they are (unfortunately) limited to an “all or nothing” behavior. Specifically, that *all* attributes at the parent level are “inherited” by all children under the parent*. There is no provision for “some attributes inherit but others should not”. As such, loading up a flex parent with extraneous attributes is a universally-recognized bad practice: every child will inherit attributes that it doesn;t need with the result of significant database bloat (and subsequent performance issues). Furthermore, editing a flex parent forces all children to be updated (even if they don’t need to). As such, it has become common practice (some would say a “best practice”) to not specify *any* editable attributes at the parent level! This somewhat defeats the purpose of flex parents doesn’t it?

The business case is both simple and obvious: customers envision their siteplan as akin to folders on their website, but they also see these “folders” as having individual attributes that represent the webpage itself. Additionally, some of these attribtues (but not all) need to be inherited by all children of the folder for various reasons: searching, filtering, organizing, etc. Using flex parents to represent such “siteplan folders” simply doesn’t work in the real-world for most Sites implementations since updating such assets is an implied requirement and as we have previsously discussed, updating a flex parent can be significantly onerous in that potentially hundreds (even thousands) of child assets might need to be republished on any given parent update. Additionally, flex parents cannot effectively be used to create hierarchies, another implied requirement.

In contrast, one of the key strengths of Endeca is that it is very adept at precise and customizable denormalizing. This is because Endeca makes no assumptions about the data and that via its tools one creates an ingestion “pipeline” that imports data into the MDEX. And via this pipeline one can design which attributes are denormalized and which are not.
As such, the combination of Endeca + Sites is ideal: Endeca excels at search, denormalization, and pagination (all of which Sites is weak at) and Sites is ideal as a caching framework, drag-and-drop page layout, and general content management (all of which Endeca is weak at).

To that end, I propose that when modeling for a project where Sites + Endeca is known to be the solution, one should avoid using Sites for denormalization and use Endeca for that task instead. Pragmatically, this means that one should not use flex parent for inherited attributes (unless absolutely necessary and only on the condition that such attributes are rarely if ever updated). In fact, I will go a bit further: I believe that the best solution for packaging up attributes to be “denormalized” down to the children would be to design various definitions of the Page asset to create one or more taxonomies of siteplan nodes, each of which stores attributes that can be used for guided search by Endeca as per the rules embodied in its pipeline.

NOTE: using the Page asset aligns very well with an important restriction in Endeca: that hierarchies can only have a single value (i.e. they must not be multi-valued). Flex parents of course do not have this limitation thus present a potential problem when integrating with Endeca. In contrast, Page assets are always single valued with regards to their hierarchies, thus presenting no issues with Endeca’s restriction on hierarchies. In other words: Page assets are an ideal way of representing hierarchies in Endeca.

One of the key benefits of disentangling the asset model from the denormalization logic is that it aligns well with agile project methodologies. As such, editors and developers can use the Page asset in an ad-hoc manner, loading them up with as many attributes as needed for each node. (see endnotes for a discussion of a missing feature to further enhance agile methodologies)

The proposed denormalization model would look something like the following, wherein we use the Sites siteplan assets (i.e. Page assets) to represent not only the navigation, but the collections of attributes to be inherited by the children under each node (very much like flex parents):

Endeca + Sites integration modeling1

Compare the above with the much simpler modeling possible with leveraging Endeca to do the denormalization for you:

Endeca + Sites integration modeling2

Individual non-taxonomic assets (e.g. News, Products, White Papers, FAQs, etc.) would have as part of their definition a required attribute that points back to the taxonomy/siteplan node to which it “belongs” (i.e. TaxonomyNode). In other words, each editor would specify where in the navigation tree the current non-taxonomic child node belongs. Example: a PressRelease would point to the News page node, An Event would point to the Events page node, and so on.

At content ingestion/indexing time, the Endeca pipeline loads all the child nodes in on the first pass The pipeline then “appends” additional attributes (as needed) derived from the siteplan/taxonomy(s) to which each child node belongs. The lookup would be based on the TaxonomyNode id as the common key.

In this way, Editors are now free create and update as many taxonomies as needed, and load these nodes up with as many attributes as necessary. If there are any new inheritance rules needed, editors would convey the ingestion rules to the Endeca pipeline developers, who will then modify the pipeline to make denormalization extraction as appropriate.

Assumptions:

  • Endeca will index all “searchable” content and provides the content for all guided search navigation pagelets on the Sites-rendered webpage. There is no requirement for Endeca Experience Manager in this solution. OTOH, there is nothing in this solution that would prevent the inclusion of Endeca Experience Manager.
  • Since ATG is not involved in this discussion, the assumption here is that Sites will do 100% of the rendering using its JSPs and deploying the Endeca assembler either in the Sites context or remotely, outputting JSON

Endnote 1:

While it is good and well that Page assets can be useful for enabling an agile methodology, the trouble is, there is no OOTB way to convert one Page node definition into another Page node definition. Example, let’s say you have a Page asset that represents a Product category and initially you specify it to be of type “Section”. Later on, you might discover that you really need a different definition for such a page perhaps called ProductCategory. Assuming you now have dozens or hundreds of such “old” section assets that need to be converted to ProductCategory, updating these would be quite onerous. The cool thing about flex assets though is that from a database point of view, converting a Page asset of definition X into definition Y would require just a simple update to the the flextemplateid field in the Page table for the given record and a cleanup of existing attribute values in the Page_Mungo table for attributes that don’t exist in the new definition. In other words, the system makes it easy to add this functionality, which I feel is a much-needed feature enhancement.

Endnote 2:

Since child assets point to Page assets (and not the other way around), there is no OOTB “view” that shows the relationship of Page to child within the GUI. As such, a missing ingredient for the above proposal would be a customization to the Siteplan tab that shows the children under each Page asset in the righthand search pane.

Endnote 3:

* Note that via a property, one can globally turn flex inheritance completely off, but that still leaves flex parents not being appropriate for hierarchies, something that Page assets excel at

Sanity Testing Baseline Updates using Endeca CAS

$
0
0

Introduction

When working with Endeca indexes, a question frequently asked by customers is whether it is possible to insert quality checks into the baseline update procedure, such that if the checks don’t succeed, the baseline update doesn’t proceed. In this blog I will demonstrate a technique using the Endeca CAS API that allows customers to validate the data in a Record Store to ensure the quantity/quality of data, prior to performing a baseline update. This technique can be very useful for ATG customers, for example, who are encountering incomplete or incorrect catalog data in their MDEX. With a few simple validation rules, you can ensure the integrity of your source data before any changes are made, and avoid late detection of issues.

Main Article

In Endeca, the Content Acquisition System (CAS) is responsible for collecting input data from a set of well-defined data sources, transforming the data if necessary, and merging it before passing it off to the indexer (Dgidx). The CAS output, by default, is persisted to a generational, flat file structure known as a Record Store. Each record in a RecordStore consists of an unordered collection of name/value pairs, and has no structural relationships to any other records. Consequently, Endeca RecordStores cannot be queried like relational databases. Fortunately, the CAS RecordStore API does provide methods for iteratively inspecting individual records of a RecordStore, and in this blog I will demonstrate how this API can be used to validate the integrity of the data before proceeding to index.

To start, let’s look at a simple example of how to connect to the CAS Service, locate a RecordStore, and ensure that it contains more than 100 records:

import com.endeca.itl.record.Record;
import com.endeca.itl.recordstore.*;

public class RecordStoreValidator {
    public static void main(String[] args) {
        int count = 0;
        if (args.length != 3) {
            System.out.println("usage: <cas host> <cas port> <rs name>");
            System.exit(-1);
        }
        String casHost = args[0];
        int casPort = Integer.parseInt(args[1]);
        String rsName = args[2];
        RecordStoreLocator locator = RecordStoreLocator.create(casHost, casPort, rsName);
        RecordStore recordStore = locator.getService();
        try {
            TransactionId tid = recordStore.startTransaction(TransactionType.READ);
            RecordStoreReader reader = RecordStoreReader.createBaselineReader(recordStore, tid);
            while (reader.hasNext()) {
                Record record = reader.next();
                count++;
            }
            reader.close();
            if (tid != null) {
                recordStore.rollbackTransaction(tid);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.out.println((count>100)?"SUCCESS":"FAILURE");
    }
}

In the above code, RecordStoreLocator is used to connect to the CAS Service, and locate the named RecordStore. Then RecordStoreReader is used to create a baseline reader and iterate over all records while maintaining a total count. To avoid any changes to the client read state, the transaction is rolled back. Finally, if the total record count is greater than 100, a message of SUCCESS is displayed.

In order to run the example, you’ll need to identify a RecordStore of interest first. If none are available, you can either configure a file system crawl, or deploy one of the sample apps. You can retrieve a list of available RecordStores using the component-manager-cmd, like so:

component-manager-cmd.bat list-components

 
To count the number of records in a RecordStore, you can use:

recordstore-cmd.bat read-baseline -a Discover-data -c

 
When you’ve identified a RecordStore of interest, pass it as an argument string along with the host name and port number of the CAS Service:

java RecordStoreValidator localhost 8500 Discover-data

 
If the record count is greater than 100, you should see a message response of SUCCESS. You could then amend this program to call System.exit() with a value of 0 for success or 1 for failure. Then call it in the baseline_update script and only proceed with the BaselineUpdate if the exit status is 0.

That would be a simple solution to the problem, but as the complexity of the validation logic grows, such a solution might become difficult to maintain. One way to improve on it is to create an interface for validation tasks. Such separation would allow for more complex validation logic, as well as the ability to selectively enable which validations you would like to perform. Attached below is the source code to such a solution. Rather than accumulate all validation logic into a single class, an abstract class for validation tasks is defined. By extending this class and implementing its abstract methods, you can focus on the validation tasks necessary for your application. Using this class, the above example can be rewritten as:

import com.endeca.itl.record.Record;

public class MinimumRecordThreshold extends RecordStoreValidationTask {
    private int count = 0;
    private int minRecordThreshold;

    public MinimumRecordThreshold(int minRecordThreshold) {
        this.minRecordThreshold = minRecordThreshold;
    }

    @Override
    public boolean checkRecord(final Record record) {
        count++;
        return true;
    }

    @Override
    public boolean doRecordsPass() {
        if (count < minRecordThreshold) {
            setFailureMessage("Insufficient records in RecordStore.");
            return false;
        }
        return true;
    }
}

Then added to the RecordStoreValidator like so:

RecordStoreValidator validator = new RecordStoreValidator("localhost", 8500, "Discover-data");
validator.addValidationTask(new MinimumRecordThreshold(100));

 
With this code, the validator will first call checkRecord() for each record in the specified RecordStore. After all records have been processed, the validator will call doRecordsPass() to determine whether the validation task was successful or not. Since record-level validation is not necessary for the minimum threshold test, the checkRecord() method just increments the count and returns true. If record-level validation were necessary, returning false would cause the validation task to fail.

To integrate the RecordStoreValidator into the baseline update script, edit DataIngest.xml to include the following:

<script id="RecordStoreValidation">
 <bean-shell-script>
    <![CDATA[ 
      RecordStoreValidator validator = new RecordStoreValidator();
      validator.setRecordStoreName("Discover-data");
      validator.addValidationTask(new MinimumRecordThreshold(100));
      boolean success = validator.runAll();
      if (!success) {
        throw new Exception("RecordStore Validation Failed!");
      }
    ]]>
  </bean-shell-script>
</script>

Then, call it in the BaselineUpdate script, prior to acquiring a lock to begin the update:

<script id="BaselineUpdate">
  ...
  // run validations
  RecordStoreValidation.run();

  // obtain lock
  ...
</script>

If the validation is not successful an exception will be thrown, which will cause the baseline update script to fail before the update process begins, hence preventing a baseline update if the RecordStore data does not pass validation.

To make RecordStoreValidator available to your BeanShell scripts, you will need to copy recordstore-validator-1.0.jar to the application directory config/lib/java, and modify the beanshell.imports file to include the following line:

import com.oracle.ateam.endeca.cas.validation.*;

 
You will also need to add the following line to runcommand.bat for runtime support:

set CLASSPATH=%CLASSPATH%;%ENDECA_ROOT%\..\..\CAS\11.1.0\lib\recordstore-api\*

 
And for logging you will need to copy slf4j-jdk14-1.5.2.jar to config/lib/java, and add the following line to logging.properties:

com.oracle.ateam.endeca.cas.validation=DEBUG

 
Then just run the baseline_update script, and if the conditions specified in your validation tasks do not succeed, you should see a message sequence similar to the following:

INFO: Starting baseline update script.
INFO: Opening CAS connection to record store 'Discover-data' on localhost:8500
SEVERE: Validation task 'MinimumRecordThreshold-1' failed with message "Insufficient records in RecordStore. Expected 8000, but found only 5684."
INFO: Completed 1 validation tasks in 3499ms
INFO: 1 out of 1 validations failed
INFO: Validation status: FAILURE
SEVERE: RecordStore Validation Failed!

 
Feel free to improve the code as you see fit. Keep in mind however, that since RecordStores are flat file structures with limited query capabilities, iterating over all records in a record store to check a set of validation constraints can be a lengthy, time-consuming process. Avoid overly complicating your validation tasks, and make sure to filter records that do not need to be validated.

Source Code

The attached source code requires Gradle, Maven, and Java 7 SDK to build. Once extracted, edit scripts/mvn_install.bat to point to your Endeca installation directory. Then run the script to install the dependent libraries into a local Maven repository. Finally, run “gradlew build” to build recordstore-validator-1.0.jar, and “gradlew javadoc” to build the javadocs.

RecordStoreValidatorSource

Special thanks to Greg Eschbacher for the validator idea and for providing the initial implementation of the RecordStoreValidator code.

Three Patterns for Integrating WebCenter Sites with Oracle Commerce

$
0
0

The A-Team is please to announce the general availability of the white paper Three Patterns for Integrating WebCenter Sites with Oracle Commerce.

This paper can be downloaded here. Three Patterns for Integrating WebCenter Sites with Oracle Commerce_v1.1

 

side-by-side-integration

Note: sample source code to support integrations with 3rd party apps is available here:

http://www.ateam-oracle.com/exporting-rendered-assets-from-webcenter-sites/

WebCenter Sites Demo Integration with Endeca Guided Search

$
0
0

In early 2014, WebCenter Sites engineering produced a demo Sites + Endeca Guided Search integration for the A-Team to review. The use case for such a demo is quite compelling: pretty much any large Sites-authored website can benefit from Guided Search technology. And while one “could” implement a guided search mechanism in WebCenter Sites it would require a lot of work as there are no OOTB features of the product to facilitate it. Further, such a bespoke solution would likely not have all the robust tools that come OOTB with Endeca Guided Search. As such, the concept of Sites + Endeca integration appeals very much to the A-Team. We think it to be a natural integration which should have lots of broad market appeal for existing and future Sites customers.

The final goal of the demo:

The following screenshot shows a WebCenter Sites Template rendering a Guided Search payload provided by Endeca of WCS-authored content indexed in the Endeca MDEX.

screenshot sites+endeca

Requirements Specific to this Demo

Required Endeca Software

The following Endeca software must be installed prior to installing the integration:

  • Endeca MDEX Engine 6.3.0 (V33381-01 or V33387-01)
  • Endeca Platform Services 6.1.3 (V33317-01 or V33316-01)
  • Tools and Frameworks with Experience Manager 3.1.0 (V33380-01 or V33386-01)
  • Content Acquisition System 3.0.2 (V31463-01 or V31482-01)

NOTE: As many of you will notice, the above does not represent the latest versions of the Commerce product.

Required WebCenter Sites Software

The WebCenter Sites/Endeca Integration has been tested on Oracle WebCenter Sites 11gR1 (11.1.1.6.0). This is available on the Oracle Software Delivery Cloud at https://edelivery.oracle.com.

NOTE: While it has not been tested, the demo should work with WebCenter Sites v11.1.1.8.

The basic pattern suggested by this Demo is as follows:

  • Specific Endeca jars are deployed to the Sites context
  • Some custom elements were created to update the WCS event framework
  • A property file describes the mapping between Sites’ asset attributes and the dimensions needed for Guided Search
  • Sites-authored web-referenceable assets are injected into a CAS Record Store upon save
  • A batch re-indexer can be scripted to run every “n” minutes to update the MDEX
  • The sample application uses the preconfigured “/services/guidedsearch” service to obtain information from the Endeca index
  • Sites Templates, having direct access to the Endeca core classes and therefore to the Endeca Guided Search POJO, can access the payload directly and then convert those values into renderable content
  • All the custom pieces needed for the demo were packaged up in a single zip archive named sites-endeca-integration.zip. Not included in the archive are any productized code from either Sites nor Endeca. Thus the expectation of this demo was for you to obtain the Endeca jars yourself and deploy them as per the documentation (see table below)

So while the idea of an integration between Sites and Endeca is compelling, as always the “devil is in the details”. To wit: there are a few details missing from the demo that we feel need to be addressed before such a solution could be made public. The two missing features which we feel are critical are:

  • The integration with Endeca should be via REST over HTTP (as opposed to deploying Endeca jars directly in the Sites classpath)
  • The MDEX must support both Global Search of Sites-authored webpages as well as Guided Search of Sites-authored webreferenceable assets

NOTE: the two features above are described in detail in another A-Team paper entitled Three Patterns for Integrating WebCenter Sites with Oracle Commerce, which can be obtained from the A-Team Chronicles blog.

What is in the Demo sites-endeca-integration.zip archive?

The following table lists all the individual components of the demo that are included in the zip archive:

The following files create events in Sites
catalogs/ElementCatalog/OpenMarket/Xcelerate/Endeca/
catalogs/ElementCatalog/OpenMarket/Xcelerate/Endeca/IndexConfig.jsp
catalogs/ElementCatalog/OpenMarket/Xcelerate/Endeca/EnableEvent.jsp
catalogs/ElementCatalog/OpenMarket/Xcelerate/Endeca/Event.jsp
 
The following files are used to create new tables and also update existing tables with new records
catalogs/
catalogs/SystemLocaleString.html
catalogs/Endeca_Q.html
catalogs/EndecaQueues.html
catalogs/SiteCatalog.html
catalogs/ElementCatalog.html
catalogs/AssetListener_reg.html
catalogs/SystemEvents.html
The following file is the documentation:
docs/
docs/Sites-Endeca Integration.doc
 
The following file is the primary jar containing all the classes (shown exploded for your review)
lib/
lib/sites-endeca-integration.jar :
lib/sites-endeca-integration/com/fatwire/endeca/extensions/
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaAssetIdEventListener.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaIndexSource.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaAssetQueueIndexSourceUtil.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaConfig.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaIndexControlHandler.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaIndexSourceMetadata.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaProcessRunner.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaRecordFileWriter.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaRecordStoreWriter.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaRecordWriter.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaSearchEngine.class
lib/sites-endeca-integration/com/fatwire/endeca/extensions/EndecaTransientAssetQueue.class
lib/sites-endeca-integration/com/META-INF/
 
The following files define the Endeca pipeline, the testdata, and the sample sites
samples/application/config/pipeline/
samples/application/config/pipeline/partial_pipeline.epx
samples/application/control/
samples/application/control/load_partial_sites_integration_data.bat
samples/application/control/load_partial_sites_integration_data.sh
samples/application/test_data/
samples/application/test_data/baseline/rs_baseline_dimvals.xml
samples/application/test_data/baseline/rs_baseline_schema.xml
samples/application/test_data/baseline/rs_baseline_data.xml
samples/application/test_data/baseline/rs_baseline_prules.xml
samples/application/test_data/config_api_input/
samples/application/test_data/config_api_input/dimension_values.csv
samples/application/test_data/config_api_input/precedence_rules.csv
samples/application/test_data/config_api_input/schema.csv
samples/application/test_data/partial/
samples/application/test_data/partial/rs_partial_delete_all_data.xml
samples/application/test_data/partial/rs_partial_data.xml
samples/assembler_api/
samples/assembler_api/assembler_api_test.jsp
samples/FirstSiteII/
samples/FirstSiteII/FSIICommon_SideNav_ProductView.jsp
samples/FirstSiteII/FSIIProductDetailView.jsp
samples/FirstSiteII/FSIIProductSideNavView.jsp
samples/sites/classes/com/endeca/infront/refapp/navigation/
samples/sites/classes/com/endeca/infront/refapp/navigation/BasicActionPathProvider.class
samples/sites/classes/
samples/sites/classes/endecafieldnames.properties
samples/sites/
samples/sites/assembler-context.xml
samples/sites/assembler.properties
samples/sites/endeca-url-config.xml
samples/sites/perf-logging-config.xml
samples/sites/web.xml
samples/
samples/endeca-integration.ini

 

For those of you new to integrating Sites with Endeca, it might be worthwhile studying the code samples above for ideas about how you will implement your own integration. You can obtain the above-mentioned zip archive here: sites-endeca-integration. It should be noted that the A-Team does not approve of this solution for real-world customers nor does the A-Team support the included code. In fact we will go on the record here that deploying Endeca jars (or any productized jars for that matter) in the WebCenter Sites context is a bad practice for customers as it makes upgrading nearly impossible. Further, support is made complicated and problematic by doing so. Instead, this demo solution is provided to demonstrate the viability of integrating the two products and allow developers to inspect the solution for concepts to incorporate into their own projects.

A Final Word from our Sponsor:

Partially inspired by this demo, the A-Team release in August 2014 a technical white paper on the A-Team Chronicles blog: Three Patterns for Integrating WebCenter Sites with Oracle Commerce. As described in the paper, code is provided to support Pattern 1. Extending the supplied codebase to support Pattern 2 is currently left up to the reader.

Notes on Querying Endeca from within an ATG Application

$
0
0

Background

On a few projects in 2014, the issue of Endeca’s performance came up. Specifically, applications were seeing a large number of queries and were also generating large response sizes from Endeca. These queries were not being generated by the Assembler API, but were one-off queries created to bring back other information from Endeca.

This guide is to give some tips on how to optimize those queries.

Notes on Endeca Query Response objects

A response from Endeca can consist of a number of different pieces of data:

  • Records (aka, products)
  • Dimensions to navigate on
  • Breadcrumbs (also sometimes called Descriptors), which return info on which things have been navigated on
  • Supplemental objects: Used internally, these bring back meta-data about the rules being executed (such as landing pages, content slots, etc)
  • Dimension search results: These are special queries that search only within dimensions, not products.
  • Key properties: Almost never used. Returns meta-data about the properties and dimensions in the index.
  • Other information about the index (such as the list of available sort keys, search fields, etc). You can’t really turn any of that off.

In an Assembler based application, and overall flow for a standard page being rendered would be this:

  1. A super-lightweight query is executed that only returns Supplemental objects representing info from Experience Manager
  2. Depending on that meta-data, one or more subsequent queries is executed that will return:
    1. Dimensions
    2. Records
    3. Breadcrumbs
    4. NOT supplemental objects (this is true for patched 10.2, 11.0 and 11.1)

Thus, a standard basic page would generate about 2 Endeca queries to be rendered. There are some cartridges that generate more:

  • featured records cartridges generate one query per cartridge. So if you have 3x featured record cartridges, that would be 3 extra queries

In addition, the Assembler API is usually very good at only bringing back the exact info it needs. This means, it will only “open up” the dimensions being requested and return back the attributes on the products specified in the ResultsList configuration.

When it comes to making standalone queries to Endeca, you need to understand the information above so as to NOT bring back any more data than necessary.

Scanning for Standalone Endeca queries in code

The easiest way to scan an application for one-off Endeca queries is to do a search for “ENEQuery” or “ENEQueryResults” in .java classes. In addition, searching for “InvokeAssembler” in .jsps. You can also search for UrlENEQuery.

If you find many instances of those, each and every query involved should be assessed using the information laid out below.

Improving your queries

Limiting which properties to return on a record

In a standard commerce application, it’s not uncommon for one record (representing a product or SKU) to have 80 or 100 or more properties. These can be things like product ID, UPC, short descriptions, size/color/widths, prices, image URLs, etc.

If you are not careful, it’s very easy to return all 80 or 100 or whatever property values with a standalone query.

To look at what comes back by default, you can look at the orange JSP reference application (typically located at http://localhost:8006/endeca_jspref, with a host of localhost and port of 15000).

The list of properties to be returned can be controlled by using ENEQuery.setSelection(). This requires you to specify every single property (and dimension) to be returned. It is case-sensitive.

Limit the number of records to return

By default, a query will return 10 records. To limit this, you can use ENEQuery.setNavNumERecs.

In the .reqlog, if you see &nbins=10, that means someone didn’t set this value specifically and is probably using the default.

At the same time, you shouldn’t set this value too large. If you find yourself setting this to 50 or 100, you might be doing something wrong.

Omit Supplemental Objects

A supplemental object is the meta-data about a landing page or content slot. If you use the orange reference app, at the top you’ll see one or more “Unknown Merch Style”. Scrolling to the bottom of page, you’ll see a series of “Matching Supplemental Objects”.

What’s the big deal about these? Well, these can actually get somewhat large in size (for instance, if you have cartridges that allow merchandisers to copy/paste raw HTML). Also, the only real time they need to come back is when doing Assembler queries, not one-off queries.

There’s no flag for turning supplemental objects on/off. However, you can add a merch rule filter that will have the effect of turning them off. (This is what a hotfix for 10.2 and what 11.x do by default. If you look in the .reqlog, you’ll see &merchrulefilter=endeca.internal.nonexistent in some of the queries).

This can be turned on by using ENEQuery.setNavMerchRuleFilter(). Basically any nonsense string in here will have the correct effect. This would also be a good place to put a message in for logging purposes. Something like ENEQuery.setNavMerchRuleFilter(“topNavigationQuery”).

In the .reqlog, you should see &merchrulefilter .

Don’t expose all dimensions

If you look at the orange reference app, you’ll see that the dimensions on the left side are “closed” up. If you click one, the page will refresh and now that dimension will be “opened” up.

If you would like to open up all dimensions, you can use ENEQuery.setNavAllRefinements(true).

However, this can be potentially very expensive. With no dimensions being returned, the MDEX Engine doens’t have to compute the refinement counts (aka, “How many records are there for Brand=XYZ”?) Also, this can inflate the response size greatly, especially for big flag dimensions.

Instead, you should specify which particular dimensions you want to return. Unfortunately, you need to specify the ID of the dimension, not the name.

If you know the ID of the dimensions you care about, you can use UrlENEQuery.setNe() and pass in a string like “123+234+532″.

Looking through the .reqlog, if you see &allgroups=1, that means somewhere someone has setNavAllRefinements(true).

Use record filters instead of keyword search

Let’s say you’re on a product details page. If you know the ID of the product, you have two choices: You can do a keyword search on the ID field passing in the string of the value. Or you can construct a record filter. A record filter is usually faster and cleaner. (There’s no reason to fill your logs with searches that customers didn’t type in).

ENEQuery.setNavRecordFilter() is the method. An example might be: query.setNavRecordFilter(“AND(product.id:2342342))”.

Use setQueryInfo for logging custom things to Endeca’s .reqlog files

A little-used feature is the ENEQuery.setQueryInfo() method. This lets you stuff any number of key/value pairs that get sent to the MDEX Engine, ignored, but written out to the .reqlog file. This can be useful for adding things like session ID, debug information, etc.

For our case, what might be good is to write out why this query is being executed. “pdpBreadcrumbs” “typeahead” , etc.

This way, during performance testing, if there are slow or big queries, it will help track them down and help distinguish between real Assembler queries and your one-off queries.

These messages will show up in the .reqlog as &log=

Don’t ever set setNavERecsPerAggrERec to 2

ENEQuery.setNavERecsPerAggrERec() allows you to specify how many records are returned per aggregate record. For example, say you are a clothing website. You probably index by SKU (which would represent a single Size/Color combination for a product). When doing query to Endeca, instead of returning info at a SKU level, you would aggregate things by a rollup key using ENEQuery.setNavRollupKey().

setNavERecsPerAggrERec() allows to you bring back 0, 1 or all SKUs within a product. You should do everything possible to NOT set it to the value of “2”, which is all.

(As a point of reference, ENEQuery has 3 static values representing those numbers. ZERO_ERECS_PER_AGGR, ONE_EREC_PER_AGGR, ALL_ERECS_PER_AGGR).

In the .reqlog, if you see &allbins=2, then that means someone setNavERecsPerAggrERec(ALL_ERECS_PER_AGGR).

Now, this might make things complicated for you. For instance, at Eddie Bauer on the search results page, they wanted to display the color swatches from each different SKU. By setting it to bring back all, they were able to iterate across all of the SKUs in the product to generate that list.

Instead, things were changed so that each SKU was tagged with information about all of the other SKUs. This allowed us to change this from 2 to 1. Response sizes went from 10 megs to 100kb.

Watch out for filtering based on timestamps

For some commerce sites, they might set products to activate during the day (“Starting at 1pm EST, this product should show up, but before that, it shouldn’t”).

One way to do this would be to tag all products with a start date and end date. And then with each query to Endeca, pass along a range filter for the dates.

The problem, however, can be that the MDEX Engine does some internal caching based on these values. If the date value you specify is too granular, then the MDEX won’t work as fast as it could. So don’t specify a timestamp down to the second or millisecond. Try and do timestamps for the hour, or at least chunks of minutes (like 20 or 30 minutes) to ensure that some cache hits occur.

Range filters can be set by using setNavRangeFilters().

In the .reqlog, you can look for &pred . A CRS example might look like: pred=product.endDate%7cGTEQ+1.4163552E12&pred=product.startDate%7cLTEQ+1.4163552E12

Don’t return key properties

This is a little-used feature, so it’s not something you’d come across very often. Key properties return meta-data about the definitions of properties and dimensions themselves. This can be turned on using ENEQuery.setNavKeyProperties(ENEQuery.KEY_PROPS_ALL).

This can greatly inflate the response size of a query from Endeca.

If you do need this for some reason, you should only need to execute the query once, and then cache the results from it.

This can be found in the .reqlog as &keyprops=all

Things that CRS does that aren’t optimal

Careful readers might notice that CRS breaks some of the rules above. In particular:

  • CRS filters based on timestamps
  • CRS used to do setNavERecsPerAggrERec = 2

What would the worst query in the world look like?

As an interesting point of reference, the world’s worst Endeca query would:

  • setNavAllRefinements(true)
  • not use .setSelection()
  • not use .setNavMerchRuleFilter()
  • uses setNavRollupKey()
  • does a wildcard keyword search
  • have a high number of search terms (in addition to the wildcard)
  • setNavNumERecs() to a large value
  • setNavKeyProperties(ENEQuery.KEY_PROPS_ALL)
  • sorts on something not frequently sorted on
  • uses pagination ( .setNavERecsOffset) to go to a high page number\
  • use a geospatial filter
  • uses a range filter ( .setNavRangeFilters())

What would the world’s fastest query look like?

  • no keyword search
  • setNavAllRefinements(false)
  • setNavNumERecs(0)
  • setNavMerchRuleFilter(“lksdkjfd”)
  • doesn’t touch setNavKeyProperties()
  • uses a setNavRecordFilter() for a record filter that had been previously used and basically filters everything out

Understanding the Endeca CAS & EAC APIs

$
0
0

Introduction

I’ve always felt that the best way to understand something is to take it apart and try to put it back together. In this blog we’ll be doing that by deconstructing the Endeca application scripts and reconstructing them in Java, revealing their inner workings and familiarizing developers with the Endeca CAS, RecordStore, and EAC API’s. Beyond exploring these API’s, the solutions presented herein may be useful to Endeca application developers needing greater flexibility and control than that available by the default scripts, and those who prefer to work in Java over BeanShell and shell scripts.

Main Article

The Endeca CAS Server is a Jetty based servlet container that manages record stores, dimensions, and crawling operations. The CAS Server API is an interface for interacting with the CAS Server. By default, the CAS Service runs on port 8500. Similarly, the Endeca EAC Central Server runs on Tomcat, and coordinates the command, control, and monitoring of EAC applications. By default, it runs on port 8888. Each of these servers, and their respective APIs, are explained in the following Endeca documents:

Content Acquisition System Developer’s Guide
Content Acquisition System API Guide
Platform Services Application Controller Guide

We will use these APIs to re-write the scripts generated by the deployment template for the Discover Electronics reference application, using Java instead of shell script.

To begin, we need to generate the scripts that we’ll be converting. Detailed instructions for this procedure are provided in the CAS Quick Start Guide, but the basic syntax for deploying the Endeca Discover Electronics CAS application is:

cd \Endeca\ToolsAndFrameworks\11.1.0\deployment_template\bin
deploy --app C:\Endeca\ToolsAndFrameworks\11.1.0\reference\discover-data-cas\deploy.xml

Make sure to answer N when prompted to install a base deployment.

Once the deploy command has finished, you should see the following files included in the C:\Endeca\Apps\Discover\control directory:

initialize_services.bat      
load_baseline_test_data.bat  
baseline_update.bat          
promote_content.bat          

These are the scripts that we will be re-writing in Java. After running our Java application, we should be able to navigate to the following URLs and see the same results as having executed the above scripts:

http://localhost:8006/discover
http://localhost:8006/discover-authoring

initialize_services

The first script that we will begin analyzing is initialize_services. Opening the file in a text editor, we see that the first thing it does is set some environment variables. Rather than use system variables, it is customary for Java applications to read from property files, so we’ll create a config.properties file to store our configuration, and load it using the following syntax:

try {
    configProperties.load(ResourceHelper.class.getClassLoader().getResourceAsStream("config.properties"));
} catch (IOException e) {
    log.error("Cannot load configuration properties.", e);
}

Next, the script checks if the --force argument was specified. If it was, the script removes any existing crawl configuration, record stores, dimension value id managers, and lastly the application. The code below shows how to remove the crawl configuration, record stores, and dimval id managers:

public static CasCrawler getCasCrawler() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    CasCrawlerLocator locator = CasCrawlerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static ComponentInstanceManager getComponentInstanceManager() throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    ComponentInstanceManagerLocator locator = ComponentInstanceManagerLocator.create(host, port);
    locator.setPortSsl(Boolean.parseBoolean(getConfigProperty("cas.ssl")));
    locator.ping();
    return locator.getService();
}

public static boolean deleteCrawl(String id) {
    try {
        getCasCrawler().deleteCrawl(new CrawlId(id));
    } catch (ItlException|IOException e) {
        log.error("Unable to delete crawl '"+id+"'", e);
    }
}

public static void deleteComponentInstance(String name) {
    try {
        getComponentInstanceManager().deleteComponentInstance(new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to delete component instance '"+name+"'", e);
    }
}

However, removing the application is a bit more involved and requires interacting with the EAC, whose configuration is stored in AppConfig.xml:

<app appName="Discover" eacHost="jprantza01" eacPort="8888" 
    dataPrefix="Discover" sslEnabled="false" lockManager="LockManager">
  <working-dir>${ENDECA_PROJECT_DIR}</working-dir>
  <log-dir>./logs</log-dir>
</app>

So we need to load AppConfig.xml, which is a Spring-based ApplicationContext configuration file:

Resource appConfigResource = new FileSystemResource(getConfigProperty("app.config"));
if (!appConfigResource.exists()) {
    appConfigResource = new ClassPathResource(appConfig);
}
if (!appConfigResource.exists()) {
    log.error("Cannot load application configuration: "+appConfig);
} else {
    XmlBeanDefinitionReader xmlReader = new XmlBeanDefinitionReader(appContext);
    xmlReader.loadBeanDefinitions(appConfigResource);
    PropertyPlaceholderConfigurer propertySubstituter = new PropertyPlaceholderConfigurer();
    propertySubstituter.setIgnoreResourceNotFound(true);
    propertySubstituter.setIgnoreUnresolvablePlaceholders(true);
    appContext.addBeanFactoryPostProcessor(propertySubstituter);
    appContext.refresh();
}

Note that the propertySubstituter (PropertyPlaceholderConfigurer) is necessary to allow for expansion of properties like ${ENDECA_PROJECT_DIR}. These properties must exist in your environment.

Once the appContext has been loaded, we can remove an app by retrieving all beans of type Component or CustomComponent and removing their definitions with:

public static void removeApp(String appName) {
    try {
        Collection<Component> components = getAppContext().getBeansOfType(Component.class).values();
        if (components.size() > 0) {
            Application app = toApplication(components.iterator().next());
            if (app.isDefined() && app.getAppName().equals(appName)) {
                Collection<CustomComponent> customComponents = getAppContext().getBeansOfType(CustomComponent.class).values();
                for (CustomComponent customComponent: customComponents) {
                    try {
                        customComponent.removeDefinition();
                    } catch (EacComponentControlException e) {
                        log.error("Unable to remove definition for "+customComponent.getElementId(), e);
                    }
                }
                app.removeDefinition();
            }
            else {
                log.warn("Application '"+appName+"' is not defined.");
            }
        }
    }
    catch (AppConfigurationException|EacCommunicationException|EacProvisioningException e) {
        log.error("Unable to remove application '"+appName+"'", e);
    }
}

Provided that the app state is clean, the script then goes on to create the record stores, create the dimension value id managers, and set the configuration on the data record store, which can be accomplished using the following code:

public static void createComponentInstance(String type, String name) {
    try {
        getComponentInstanceManager().createComponentInstance(new ComponentTypeId(type), new ComponentInstanceId(name));
    } catch (ComponentManagerException|IOException e) {
        log.error("Unable to create "+typeId+" instance '"+name+"'", e);
    }
}

public static void setConfiguration(RecordStore recordStore, File configFile) {
    try {
        recordStore.setConfiguration(RecordStoreConfiguration.load(configFile));
    } catch (RecordStoreConfigurationException e) {
        StringBuilder errorText = new StringBuilder();
        for (RecordStoreConfigurationError error: e.getFaultInfo().getErrors()) {
            errorText.append(error.getErrorMessage()).append("\n");
        }
        log.error("Invalid RecordStore configuration:\n"+errorText);
    } catch (RecordStoreException e) {
        log.error("Unable to set RecordStore configuration", e);
    }
}

It then calls out to the following BeanShell script, found in InitialSetup.xml:

<script id="InitialSetup">
  <bean-shell-script>
    <![CDATA[ 
  IFCR.provisionSite();
  CAS.importDimensionValueIdMappings("Discover-dimension-value-id-manager", 
      	InitialSetup.getWorkingDir() + "/test_data/initial_dval_id_mappings.csv");
    ]]>
  </bean-shell-script>
</script>

Now, if we wanted to convert these scripts to Java as well, we could do the following:

IFCRComponent ifcr = (IFCRComponent) getAppContext().getBean("IFCR", IFCRComponent.class);
ifcr.provisionSite();
...

But to keep this exercise simple, I chose not to convert the BeanShell scripts, and rather to leave it as an exercise for the reader. All that the BeanShell scripts do is bind to Spring Beans that are defined elsewhere in the configuration, and call their Java methods. For example, the IFCR component is defined in WorkbenchConfig.xml.

Instead, to execute the BeanShell scripts, you can use the convenience method invokeBeanMethod():

try {
    invokeBeanMethod("InitialSetup", "run");
} catch (IllegalAccessException|InvocationTargetException e) {
    log.warn("Failed to configure EAC application. Services not initialized properly.", e);
    releaseManagedLocks();
}

After the initial setup is complete, we can create the crawl configuration using the following code:

public static void createCrawl(CrawlConfig config) {
    try {
        List<ConfigurationMessage> messages = getCasCrawler().createCrawl(config);
        StringBuilder messageText = new StringBuilder();
        for (ConfigurationMessage message: messages) {
            messageText.append(message.getMessage()).append("\n");
        }
        log.info(messageText.toString());
    }
    catch (CrawlAlreadyExistsException e) {
        log.error("Crawl unsuccessful. A crawl with id '"+config.getCrawlId()+"' already exists.");
    }
    catch (InvalidCrawlConfigException|IOException e) {
        log.error("Unable to create crawl "+config.getCrawlId(), e);
    }
}

Finally, to import the content we can use either invokeBeanMethod() to call methods on the IFCR component, or look up the IFCRComponent using getBean() and call the import methods on it directly.

load_baseline_test_data

The next script, load_baseline_test_data, is responsible for loading the test data into the record stores. The two record stores that need to be populated are: Discover-data, and Discover-dimvals. These record stores are populated using the data from the following files:

Discover-data C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_data.xml.gz
Discover-dimvals C:/Endeca/Apps/Discover/test_data/baseline/rs_baseline_dimvals.xml.gz

To do this, we’ll first need to create or locate the record stores:

public static RecordStore getRecordStore(final String instanceName) throws IOException {
    String host = getConfigProperty("cas.host");
    int port = Integer.parseInt(getConfigProperty("cas.port"));
    RecordStoreLocator locator = RecordStoreLocator.create(host, port, instanceName);
    locator.ping();
    return locator.getService();
}

Then, the following code can be used to load the data:

public boolean loadData(final String recordStoreName, final String dataFileName, final boolean isBaseline) {
    File dataFile = new File(dataFileName);
    if (!dataFile.exists() || !dataFile.isFile()) { // verify file exists
        log.error("Invalid data file: " + dataFile);
        return false; // failure
    }
    TransactionId txId = null;
    RecordReader reader = null;
    RecordStoreWriter writer = null;
    RecordStore recordStore = null;
    int numRecordsWritten = 0;
    try {
        recordStore = getRecordStore(recordStoreName);
        txId = recordStore.startTransaction(TransactionType.READ_WRITE);
        reader = RecordIOFactory.createRecordReader(dataFile);
        writer = RecordStoreWriter.createWriter(recordStore, txId, 500);
        if (isBaseline) {
            writer.deleteAll();
        }
        for (; reader.hasNext(); numRecordsWritten++) {
            writer.write(reader.next());
        }
        close(writer); // must close before commit
        recordStore.commitTransaction(txId);
        log.info(numRecordsWritten + " records written.");
    }
    catch (IOException|RecordStoreException e) {
        log.error("Unable to update RecordStore '"+recordStoreName+"'", e);
        rollbackTransaction(recordStore, txId);
        return false; // failure
    }
    finally {
        close(reader);
        close(writer);
    }
    return true; // success
}

This code will open the record store for write access, remove all existing records, iterate through all records in the data file, and write them to the record store. Then either commit or roll back the transaction, and close any resources. This is called once for each record store. That’s all that the load_baseline_test_data script does.

baseline_update & promote_content

The last two scripts, baseline_update and promote_content, simply call out to the BeanShell scripts ‘BaselineUpdate’ and ‘PromoteAuthoringToLive’, which reside in DataIngest.xml, and WorkbenchConfig.xml respectively. BaselineUpdate will run the crawl, update and distribute the indexes. PromoteAuthoringToLive will export the configurations to the LiveDgraphCluster, and update the assemblers on LiveAppServerCluster. Both of these BeanShell scripts can be called by using either invokeBeanMethod() or getBean().

Source Code

Attached below are a set of Java files that execute the same behavior as the application scripts, using the methods outlined above. The class files reflect the scripts they are modeled after:

Script Java Class
initialize_services com.oracle.ateam.endeca.example.itl.Initializer
load_baseline_test_data com.oracle.ateam.endeca.example.itl.Loader
baseline_update com.oracle.ateam.endeca.example.itl.Updater
promote_content com.oracle.ateam.endeca.example.itl.Promoter

 
You can run each Java class individually, or you can run everything all at once by using com.oracle.ateam.endeca.example.itl.Driver. Included in the distribution are build scripts, run scripts, and sample configuration files. If you have Endeca installed in a directory other than the default, then you may need to modify some files slightly.

Hopefully this exercise has helped eliminate some of the mystery behind what these scripts actually do. Feel free to modify the code as you need, but keep in mind that new product releases may modify the deployment templates, so keep an eye out for changes if you decide to incorporate this code into your solutions.

The attached source code requires Gradle, Maven, and Java 7 SDK to build. Once extracted, edit “scripts/mvn_install.bat” to point to your Endeca installation directory. Then run the script to install the dependent libraries into a local Maven repository. Finally, run “gradlew build” to build “discover_data_cas_java-1.0.jar”, and “gradlew javadoc” to build the javadocs.

DiscoverDataCASJavaSource

Notes on Querying Endeca from within an ATG Application

$
0
0

Background

On a few projects in 2014, the issue of Endeca’s performance came up. Specifically, applications were seeing a large number of queries and were also generating large response sizes from Endeca. These queries were not being generated by the Assembler API, but were one-off queries created to bring back other information from Endeca.

This article will give some tips on how to optimize those queries.

Notes on Endeca Query Response objects

A response from Endeca can consist of a number of different pieces of data:

  • Records (aka, products)
  • Dimensions to navigate on
  • Breadcrumbs (also sometimes called Descriptors), which return information on which things have been navigated on
  • Supplemental objects: Used internally, these bring back meta-data about the rules being executed (such as landing pages, content slots, etc.)
  • Dimension search results: These are special queries that search only within dimensions, not products.
  • Key properties: Almost never used. Returns meta-data about the properties and dimensions in the index.
  • Other information about the index (such as the list of available sort keys, search fields, etc). You can’t really turn any of that off.

In an Assembler based application, the overall flow for a standard page being rendered would be this:

First, a super-lightweight query is executed that only returns Supplemental objects representing information from Experience Manager. Depending on that meta-data, one or more subsequent queries is executed that will return:

  1. 1. Dimensions
  2. 2. Records
  3. 3. Breadcrumbs
  4. 4. NOT supplemental objects (this is true for patched 10.2, 11.0 and 11.1)

Thus a standard basic page would generate about two Endeca queries to be rendered. There are some cartridges that generate more:

  • featured records cartridges generate one query per cartridge. So if you have three featured record cartridges, there would be three extra queries

In addition, the Assembler API is usually very good about only bringing back the exact information it needs. This means that it will only “open up” the dimensions being requested, and will return back the attributes on the products specified in the ResultsList configuration.

When it comes to making standalone queries to Endeca, you need to understand the information above so as to NOT bring back any more data than necessary.

Scanning for Standalone Endeca queries in code

The easiest way to scan an application for one-off Endeca queries is to do a search for “ENEQuery” or “ENEQueryResults” in you .java classes. In addition, searching for “InvokeAssembler” in .jsp’s. You can also search for UrlENEQuery.

If you find many instances of those, each and every query involved should be assessed using the information laid out below.

Improving your queries

Limiting which properties to return on a record

In a standard commerce application, it’s not uncommon for one record (representing a product or SKU) to have 80 or 100 or more properties. These can be things like product ID, UPC, short descriptions, size/color/widths, prices, image URLs, etc.

If you are not careful, it’s very easy to return all 80 or 100 or whatever property values with a standalone query.

To look at what comes back by default, you can look at the orange JSP reference application (typically located at http://localhost:8006/endeca_jspref, with a host of localhost and port of 15000).

The list of properties to be returned can be controlled by using ENEQuery.setSelection(). This requires that you to specify every single property (and dimension) to be returned. It is case-sensitive.

Limit the number of records to return

By default, a query will return 10 records. To limit this, you can use ENEQuery.setNavNumERecs.

In the .reqlog, if you see &nbins=10, that means that someone didn’t set this value specifically and is probably using the default.

At the same time, you shouldn’t set this value to be too large. If you find yourself setting this to 50 or 100, you might be doing something wrong.

Omit Supplemental Objects

A supplemental object is the meta-data about a landing page or content slot. If you use the orange reference application, at the top you’ll see one or more “Unknown Merch Style”. Scrolling to the bottom of page, you’ll see a series of “Matching Supplemental Objects”.

What’s the big deal about these? Well, these can actually get somewhat large in size (for instance, if you have cartridges that allow merchandisers to copy/paste raw HTML). Also, the only real time they need to come back is when doing Assembler queries; not one-off queries.

There’s no flag for turning supplemental objects on/off. However, you can add a merch rule filter that will have the effect of turning them off. (This is what a hotfix for 10.2 and what 11.x do by default. If you look in the .reqlog, you’ll see &merchrulefilter=endeca.internal.nonexistent in some of the queries).

This can be turned on by using ENEQuery.setNavMerchRuleFilter(). Basically any nonsense string in here will have the correct effect. This would also be a good place to put a message in for logging purposes. Something like ENEQuery.setNavMerchRuleFilter(“topNavigationQuery”).

In the .reqlog, you should see &merchrulefilter .

Don’t expose all dimensions

If you look at the orange reference app, you’ll see that the dimensions on the left side are “closed” up. If you click one, the page will refresh and now that dimension will be “opened” up.

If you would like to open up all dimensions, you can use ENEQuery.setNavAllRefinements(true).

However, this can be potentially very expensive. With no dimensions being returned, the MDEX Engine doesn’t have to compute the refinement counts (aka, “How many records are there for Brand=XYZ”?) Also, this can inflate the response size greatly, especially for big flag dimensions.

Instead, you should specify which particular dimensions you want to return. Unfortunately, you need to specify the ID of the dimension, not the name.

If you know the ID of the dimensions you care about, you can use UrlENEQuery.setNe() and pass in a string like “123+234+532″.

Looking through the .reqlog, if you see &allgroups=1, that means somewhere someone has setNavAllRefinements(true).

Use record filters instead of keyword search

Let’s say you’re on a product details page. If you know the ID of the product, you have two choices: You can do a keyword search on the ID field passing in the string of the value. Or you can construct a record filter. A record filter is usually faster and cleaner. (There’s no reason to fill your logs with searches that customers didn’t type in).

ENEQuery.setNavRecordFilter() is the method. An example might be: query.setNavRecordFilter(“AND(product.id:2342342))”.

Use setQueryInfo for logging custom things to Endeca’s .reqlog files

A little-used feature is the ENEQuery.setQueryInfo() method. This lets you stuff any number of key/value pairs that get sent to the MDEX Engine, ignored, but written out to the .reqlog file. This can be useful for adding things like session ID, debug information, etc.

For our case, what might be good is to write out why this query is being executed. “pdpBreadcrumbs” “typeahead” , etc.

This way, if there are slow or big queries found during performance testing, it will help track them down and help distinguish between real Assembler queries and your one-off queries.

These messages will show up in the .reqlog as &log=

Don’t ever set setNavERecsPerAggrERec to 2

ENEQuery.setNavERecsPerAggrERec() allows you to specify how many records are returned per aggregate record. For example, say you are a clothing website. You probably index by SKU (which would represent a single Size/Color combination for a product). When doing query to Endeca, instead of returning info at a SKU level, you would aggregate things by a rollup key using ENEQuery.setNavRollupKey().

setNavERecsPerAggrERec() allows to you bring back 0, 1 or all SKUs within a product. You should do everything possible to NOT set it to the value of “2”, which is all.

(As a point of reference, ENEQuery has 3 static values representing those numbers. ZERO_ERECS_PER_AGGR, ONE_EREC_PER_AGGR, ALL_ERECS_PER_AGGR).

In the .reqlog, if you see &allbins=2, then that means someone setNavERecsPerAggrERec(ALL_ERECS_PER_AGGR).

Now, this might make things complicated for you. For instance, on one site on the search results page, they wanted to display the color swatches from each different SKU. By setting it to bring back all, they were able to iterate across all of the SKUs in the product to generate that list.

Instead, things were changed so that each SKU was tagged with information about all of the other SKUs. This allowed us to change this from 2 to 1. Response sizes went from 10 megs to 100kb.

Watch out for filtering based on timestamps

For some commerce sites, they might set products to activate during the day (“Starting at 1pm EST, this product should show up, but before that, it shouldn’t”).

One way to do this would be to tag all products with a start date and end date. And then with each query to Endeca, pass along a range filter for the dates.

The problem, however, can be that the MDEX Engine does some internal caching based on these values. If the date value you specify is too granular, then the MDEX won’t work as fast as it could. So don’t specify a timestamp down to the second or millisecond. Try and do timestamps for the hour, or at least chunks of minutes (like 20 or 30 minutes) to ensure that some cache hits occur.

Range filters can be set by using setNavRangeFilters().

In the .reqlog, you can look for &pred . A CRS example might look like: pred=product.endDate%7cGTEQ+1.4163552E12&pred=product.startDate%7cLTEQ+1.4163552E12

Don’t return key properties

This is a little-used feature, so it’s not something you’d come across very often. Key properties return meta-data about the definitions of properties and dimensions themselves. This can be turned on using ENEQuery.setNavKeyProperties(ENEQuery.KEY_PROPS_ALL).

This can greatly inflate the response size of a query from Endeca.

If you do need this for some reason, you should only need to execute the query once, and then cache the results from it.

This can be found in the .reqlog as &keyprops=all

Things that CRS does that aren’t optimal

Careful readers might notice that CRS breaks some of the rules above. In particular:

  • CRS filters based on timestamps
  • CRS used to do setNavERecsPerAggrERec = 2

What would the worst query in the world look like?

As an interesting point of reference, the world’s worst Endeca query would:

  • setNavAllRefinements(true)
  • not use .setSelection()
  • not use .setNavMerchRuleFilter()
  • uses setNavRollupKey()
  • does a wildcard keyword search
  • have a high number of search terms (in addition to the wildcard)
  • setNavNumERecs() to a large value
  • setNavKeyProperties(ENEQuery.KEY_PROPS_ALL)
  • sorts on something not frequently sorted on
  • uses pagination ( .setNavERecsOffset) to go to a high page number\
  • use a geospatial filter
  • uses a range filter ( .setNavRangeFilters())

What would the world’s fastest query look like?

  • no keyword search
  • setNavAllRefinements(false)
  • setNavNumERecs(0)
  • setNavMerchRuleFilter(“lksdkjfd”)
  • doesn’t touch setNavKeyProperties()
  • uses a setNavRecordFilter() for a record filter that had been previously used and basically filters everything out

Configuring OAM SSO for ATG BCC and Endeca XM

$
0
0

Introduction

Single sign-on, or “SSO” as it’s commonly referred to, is an authentication method that allows a user access to multiple applications through a single, secure, point of entry. Rather than authenticate separately for each application, users authenticate once through a centralized service. The benefits of SSO to end users are obvious, but there are also many cost and compliance advantages that are of interest to large organizations, which is why Oracle’s enterprise customers have increasingly demanded SSO integration with Oracle Access Manager (OAM). With the introduction Oracle Commerce 11 they now have it, and in this blog I will be demonstrating how to use OAM to enable SSO between the ATG Business Control Center (BCC) and Endeca Experience Manager (XM).

Main Article

E-commerce applications are rarely simple. Often they require access to a variety of disparate systems, including inventory, fulfillment, service center, and marketing systems. A common obstacle when working with such heterogeneous systems is authenticating and preserving identity integrity. In response, many organizations choose to incorporate single sign-on solutions into their integration architecture. To meet customer demands, as well as continue its vision of unifying the Commerce toolset, Oracle has introduced in its Commerce 11.0 release a Commerce-only SSO solution that comes standard, and an Enterprise SSO solution that leverages Oracle Access Manager. The focus of this article is on the latter, illustrating how OAM can be used to provide single sign-on capability between the ATG BCC and Endeca Workbench/XM, as well as other Oracle and non-Oracle products.

Oracle Access Manager (OAM) is an industry-leading Web Access Management (WAM) solution that provides Web Single Sign-On, centralized policy administration, real-time session management and auditing. It is core to Oracle’s Access Management platform. OAM enforces access policies using web server agents called as WebGates. The WebGates intercept site traffic and verify that the user is authenticated and authorized to access the requested resource. If the user isn’t yet authenticated, the WebGate redirects the user to a login page, which validates the user’s credentials against a user repository. Once authenticated, a session gets established on the Access Manager server, and as the user tries to access different applications and resources, the Access Manager server evaluates whether the user is permitted to access that particular resource, and conveys its decision back to the WebGate for enforcement.

The procedure outlined below demonstrates how to install and configure OAM for use with Oracle Commerce 11 for authentication only. Authorizations are still managed by the respective product consoles. This article assumes that the reader is familiar with Oracle Commerce, but new to OAM. As such, I demonstrate the basic install procedure for OAM. This information is no substitute for product installation and configuration documentation, and may be incomplete. It is therefore recommended that the reader first review the product documentation and use this article only to augment or bolster their understanding.

Installing OAM 11gR2

To configure Oracle Commerce for SSO with Oracle Access Manager, an OAM environment must exist and be accessible to the Commerce servers. Below is a basic installation procedure for OAM on a Linux based environment, intended to help Commerce developers configure a local environment for testing and further exploration. If you have access to an existing OAM environment, then feel free to skip this section and continue to Commerce configuration, noting the host and port differences.

Installation Procedure

  1. Start by downloading the following software
    • • Oracle Identity and Access Management 11g (11.1.2.2.0)
    • • Oracle Fusion Middleware Repository Creation Utility 11g (11.1.2.2.0)
    • • Oracle Access Manager OHS 11g WebGates 11.1.2.2.0
    • • Oracle WebTier Utilities 11gR1 (11.1.1.7)
    • • Oracle WebLogic Server 10.3.6
    • • Java SE Development Kit 6u45
  2. Install the RCU schema
    Extract “ofm_rcu_linux_11.1.2.2.0_64_disk1_1of1.zip” and run rcu in a 32-bit shell:
    linux32 bash
    ./rcuHome/bin/rcu
    

    Database Type: Oracle Database
    Host Name: localhost
    Port: 1521
    Service Name: orcl
    Username: sys
    Password: password

    Select “Oracle Access Manager” under Identity Management
    Use the same password for all schemas: welcome1

  3. Install Java and WebLogic Server
    Extract “jdk-6u45-linux-x64.bin” to /app/oracle/product/fmw11g/jdk160_45. Then run:
    java -jar wls1036_generic.jar
    

    Install to: /app/oracle/product/fmw11g

    If the GUI doesn’t start, try:

    yum install libXtst.i686
    
  4. Install Oracle Access Manager
    Extract “ofm_iam_generic_11.1.2.2.0_disk1_1of2.zip” and “ofm_iam_generic_11.1.2.2.0_disk1_2of2.zip” and run the OAM installer:
    ./Disk1/runInstaller -jreLoc /app/oracle/product/fmw11g/jdk160_45
    

    Oracle Middleware Home: /app/oracle/product/fmw11g
    Oracle Home Directory: idm_11.1.2

    If the Prerequisite Checks fail during installation, try:

    yum install compat-libcap1
    yum install compat-libstdc++-33.i686
    

    Note: To remove OAM, you can run the installer again with the -deinstall option.

    cd /app/oracle/product/fmw11g
    ./idm_11.1.2/oui/bin/runInstaller -deinstall
    
  5. Create a WebLogic Domain for the OAM Servers
    . /app/oracle/product/fmw11g/wlserver_10.3/server/bin/setWLSEnv.sh
    /app/oracle/product/fmw11g/wlserver_10.3/common/bin/config.sh
    

    Select:
    • Oracle Access Management
    • Oracle Enterprise Manager

    Domain Name: idm_domain
    Password: welcome1

    Select both Schemas and change:

    DBMS/Service: orcl
    Host Name: localhost
    Password: welcome1

  6. Upgrade Schemas using Patch Assistant
    Run the Patch Assistant:
    /app/oracle/product/fmw11g/oracle_common/bin/psa
    

    Select:
    • Oracle Access Manager

    Connect String: localhost:1521/ORCL
    DBA User Name: sys as sysdba
    DBA Password: password

    Schema User Name: DEV_IAU
    Schema Password: welcome1

  7. Create a Security Store for the WLS Domain
    cd /app/oracle/product/fmw11g/idm_11.1.2/common
    bin/wlst.sh tools/configureSecurityStore.py -d /app/oracle/product/fmw11g/user_projects/domains/idm_domain -c IAM -p welcome1 -m create
    

    Note: if you encounter any issues, re-run configureSecurityStore.py using “-m validate_fix”, instead of “-m create”. Then run “-m validate” to verify the configuration.

  8. Start WebLogic Servers
    cd /app/oracle/product/fmw11g/user_projects/domains/idm_domain
    ./startWebLogic.sh
    ./bin/startManagedWebLogic.sh oam_server1
    

    The following URLs should now be accessible:
    http://localhost:7001/console
    http://localhost:7001/em
    http://localhost:7001/oamconsole

  9. Install Oracle WebTier Utilities 11gR1
    Extract “ofm_webtier_linux_11.1.1.7.0_64_disk1_1of1.zip” and execute ./Disk1/runInstaller
    Select “Install Software – Do Not Configure”

    Oracle Middleware Home: /app/oracle/product/fmw11g
    Oracle Home Directory: webtier_11.1

    /app/oracle/product/fmw11g/webtier_11.1/bin/config.sh
    

    Select:
    • Oracle HTTP Server
    • Associate Selected Components with WebLogic Domain

    Make sure the WLS Admin Server is running, and specify the OAM Admin port number (7001).

    Instance Home Location: /app/oracle/product/fmw11g/webtier_11.1/instances/instance1
    Instance Name: instance1
    OHS Component Name: ohs1

  10. Install OAM 11gR2 (11.1.2.2) Webgate for OHS 11gR1 (11.1.1.7)
    Extract “ofm_webgates_generic_11.1.2.2.0_disk1_1of1.zip” and run Installer:
    ./Disk1/runInstaller -jreLoc /app/oracle/product/fmw11g/jdk160_45
    

    Oracle Middleware Home: /app/oracle/product/fmw11g
    Oracle Home Directory: webgate_11.1.2

    cd /app/oracle/product/fmw11g/webgate_11.1.2/webgate/ohs/tools/deployWebGate
    ./deployWebGateInstance.sh -w /app/oracle/product/fmw11g/webtier_11.1/instances/instance1/config/OHS/ohs1 -oh /app/oracle/product/fmw11g/webgate_11.1.2
    
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/app/oracle/product/fmw11g/webtier_11.1/lib:/app/oracle/product/fmw11g/webgate_11.1.2/webgate/ohs/lib
    
    cd /app/oracle/product/fmw11g/webgate_11.1.2/webgate/ohs/tools/setup/InstallTools
    ./EditHttpConf -w /app/oracle/product/fmw11g/webtier_11.1/instances/instance1/config/OHS/ohs1 -oh /app/oracle/product/fmw11g/webgate_11.1.2 -o webgate.conf
    
  11. Start OHS
    cd /app/oracle/product/fmw11g/webtier_11.1/instances/instance1/bin
    ./opmnctl startall
    

    To verify that OHS started successfully, open a browser and access the following URL:
    http://localhost:7777

    If you don’t see a welcome page, then the server is not running. Check the OHS log in the diagnostics directory and ensure that there are no errors being reported. If you see the following error: “libexpat.so.0: cannot open shared object file” then you may need to install libexpat using the procedure below.

    Download the latest Oracle Linux 6 repo configuration file and install compat-expat1:

    su - root
    cd /etc/yum.repos.d
    wget http://public-yum.oracle.com/public-yum-ol6.repo
    yum install compat-expat1
    
  12. Uninstall Procedure (in case of emergency)
    In the event that something goes wrong and you need to remove and re-install the products, the following commands can be used to uninstall everything:
    cd /app/oracle/product/fmw11g
    ./webgate_11.1.2/oui/bin/runInstaller -deinstall
    ./webtier_11.1/oui/bin/runInstaller -deinstall
    ./idm_11.1.2/oui/bin/runInstaller -deinstall
    <RCU_INSTALLER>/rcuHome/bin/rcu   (select the Drop Schema option)
    

    Optional:

    ./oracle_common/oui/bin/runInstaller -jreLoc /app/oracle/product/fmw11g/jdk160_45 -deinstall
    rm -f /app/oracle/product/fmw11g
    

This should be sufficient for a simple OAM installation. For configurations beyond the scope of this basic install procedure, or for more detailed instructions, please refer to the product installation documentation for OAM, which can be found here:

Quick Installation Guide for Oracle Identity and Access Management
Installation Guide for Oracle Identity and Access Management

Configuring Oracle HTTP Server (OHS) Proxy

In order for OAM to intercept Commerce requests the traffic must be routed through an Oracle HTTP Server (OHS) where an OAM WebGate agent resides. In steps 9-11 above, we installed and configured an OHS server with a WebGate agent, but we need to amend this configuration to proxy requests from OHS to the ATG BCC and Endeca Workbench/XM. Assuming that your ATG environment will be running on Oracle WebLogic Server, the simplest way to achieve this is to use separate OHS Virtual Hosts for the BCC and XM, and to proxy from the root path using the WebLogic Proxy Plug-in.

In my setup, I chose to install everything into virtualized environments using Oracle VirtualBox. This isn’t necessary, but can be useful if you’re working with multiple versions of these products. I installed the Commerce software in a separate Virtual Machine (VM) than the Security software to avoid possible port conflicts, and increase the reusability of my VMs, but this isn’t necessary either. If you wanted to install everything in one environment, that would work as well, provided you have the hardware to support it. My VMs communicate with one another by leveraging a feature new to VirtualBox 4.3+ known as “NAT Networking” which allows virtual machines to talk to each other on the same host and communicate with the outside world.

In my configurations below, you’ll see references to the hostnames “hadrian” and “tiberius”. These are the hostnames of my VMs, which are named respectively after the Roman emperors, who built the defensive fortification known as Hadrian’s Wall to keep the barbarians out, and who replenished the imperial treasury. Feel free to use any hostname that makes it easy to remember which server is which.

To configure the virtual hosts and set up the proxy rules, edit the file mod_wl_ohs.conf:

cd /app/oracle/product/fmw11g/webtier_11.1/instances/instance1/config/OHS/ohs1
vi mod_wl_ohs.conf

And add a section similar to the following, noting that your hostnames and port numbers may differ:

LoadModule weblogic_module   "${ORACLE_HOME}/ohs/modules/mod_wl_ohs.so"

<IfModule weblogic_module>
  Debug ON
  WLLogFile /tmp/weblogic.log

  Listen 7778
  <VirtualHost *:7778>
    <Location />
        SetHandler weblogic-handler
        WebLogicHost tiberius
        WebLogicPort 8006
    </Location>
  </VirtualHost>

  Listen 7779
  <VirtualHost *:7779>
    <Location />
        SetHandler weblogic-handler
        WebLogicHost tiberius
        WebLogicPort 7103
    </Location>
  </VirtualHost>

</IfModule>

This will proxy all requests made to hadrian on port 7778 to the Endeca server running Workbench/XM on tiberius:8006, and all requests made on port 7779 to the ATG server running on tiberius:7103. It is possible to proxy from a single host, but you would need to define location handlers for all the various proxy paths (/atg, /ifcr, /preview, /rest, /js, /dojo-1, …) because ATG does not keep all of its URL paths under an “/atg” context.

For these changes to take effect, you will need to stop and restart OHS:

cd /app/oracle/product/fmw11g/webtier_11.1/instances/instance1/bin
opmnctl stopall
opmnctl startall

Afterwards, the new URL for accessing the Endeca Workbench/XM through OHS/OAM will be:
http://hadrian:7778/ifcr

And the new URL for accessing the ATG BCC through OHS/OAM will be:
http://hadrian:7779/atg/bcc

For more information on configuring Virtual Hosts or the WebLogic Proxy Plug-in, you can also consult the following resources:

Apache Virtual Host Documentation
Using Web Server Plug-Ins with Oracle WebLogic Server

Enabling OAM for ATG Commerce BCC

Next we need to configure ATG Commerce to accept OAM authenticated users. The easiest way to configure OAM on ATG Commerce is to use the ATG Configuration and Installation Manager (CIM). CIM is a text-based application that simplifies configuration for ATG products. You can launch it by executing cim.sh from the “$ATG_INSTALL/ATG/home/bin” directory. At the end of the Product Selection phase of CIM configuration, you have the option to enable product add-ons. One of the product add-on options is “Single Sign On (SSO)”. To enable OAM-based SSO with ATG Commerce, you must enable the SSO add-on, and then select “OAM Authentication” as the SSO authentication model. Below is a sample transcript of such a session.

[oracle@localhost]$ /app/oracle/product/atg/ATG/home/bin/cim.sh

The following installed ATG components are being used to launch:
  ATGPlatform version 11.0 installed at /app/oracle/product/atg/ATG

Nucleus running

     Oracle ATG Web Commerce Configuration Installation Manager

-------START OPSS SERVICES------------------------------------------------------
enter [h]Help, [q]Quit to exit 

Starting the Oracle Platform Security Services (OPSS)

=======CIM MAIN MENU============================================================
enter [h]Help, [q]Quit to exit 

Choose the task you want to perform:
  [1]  Database Configuration - Done
  [2]  Configure OPSS Security - Done
  [3]  Server Instance Configuration - Done
  [4]  Application Assembly & Deployment - Done
  [R]  Set the Administrator Password - Done
  [P]  Product Selection - Done (ATG REST & ATG Site Administration & 
ATG-Endeca Integration & Oracle ATG Commerce Service Center & Oracle Commerce 
Reference Store & ATG Content Administration)
  [A]  Select Application Server - Done (Tomcat)
 *[C]  Custom CIM Plugin Launcher 

 > P

-------ANALYZING PRODUCT DIRECTORIES--------------------------------------------

Please wait as CIM analyzes your product folders. . . . . . . . . . . . 
Analysis complete: 6 seconds

-------WARNING------------------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

Changing your product selection may require changes to your configuration. 
Database Configuration, Server Instance Configuration, and Application 
Assembly and Deployment will need to be redone. Are you sure you want to 
continue?
 
 *[C]  Continue
  [A]  Cancel

 > C

-------PRODUCT SELECTION--------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

Select product you wish to configure by entering the corresponding item number.

  (Searching for products... done.)

  Choose one of the following options: (* = Currently selected )

  [1]  ATG Platform - 
        Includes, optionally, data warehouse components
 
 *[2]  ATG REST - 
        RESTful Web Services
 
  [3]  WebCenter Sites Extensions - 
        Includes ATG Platform and Endeca Reader.
 
 *[4]  ATG Site Administration - 
        Includes ATG Platform and Content Administration
 
 *[5]  ATG-Endeca Integration - 
        Includes ATG Platform. Select this option when Endeca is used.
 
 *[6]  ATG Content Administration - 
        Includes ATG Platform.  Optional: Preview
 
  [7]  ATG Commerce - 
        Includes ATG Platform and Content Administration. Optional: data 
        warehouse components, Preview and Merchandising UI
 
  [8]  Endeca Reader - 
        Includes ATG Platform. Select this option when Endeca is used to 
        import data to ATG.
 
 *[9]  Oracle ATG Commerce Service Center - 
        Agent-facing commerce application
 
 *[10]  Oracle Commerce Reference Store - 
        Includes the ATG platform, ATG-Endeca Integration, ATG Content 
        Administration, Site Administration, Oracle ATG Web Commerce, and 
        Oracle ATG Web Commerce Merchandising. Optional: data warehouse 
        components and Preview
 
  [D]  Done

Select one or more > D

-------ENDECA SEARCH------------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

The following addon(s) have been automatically included for the selected 
product: Endeca Search

-------MERCHANDISING UI---------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

The following addon(s) have been automatically included for the selected 
product: Merchandising UI

-------CHOOSE ADDONS :----------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

  Choose AddOns :
  [1]  Reporting
  [2]  Staging Server
  [3]  Dedicated Lock Servers
 *[4]  Single Sign On (SSO)
  [5]  Abandoned Order Services
 *[6]  Preview Server
  [D]  Done

Select zero or more > D

-------SSO AUTHENTICATION-------------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

  SSO Authentication
 *[1]  Commerce Only SSO Authentication
  [2]  OAM Authentication

Select one > 2

-------INCLUDE DEMO APPLICATION:------------------------------------------------
enter [h]Help, [m]Main Menu, [q]Quit to exit 

  Include Demo Application:
  [1]  Quincy Funds Demo
  [D]  Done

Select zero or one > D

Once the SSO add-on has been enabled, and OAM Authentication selected as the authentication method, you can configure your production and publishing servers, like you would in a typical ATG Commerce installation, but use the OHS server values instead. So, where you see:

  • Fully-qualified Workbench Hostname – use the complete hostname of the OHS server.
  • Workbench Port Number – use the OHS virtual host port that proxies to Endeca Workbench.
  • OAM Web Server Hostname – use the hostname of the OHS server instead.
  • OAM Web Server Port – use the OHS virtual host port that proxies to ATG BCC.

For example, here are the values that I used throughout my configuration:

Fully-qualified Workbench Hostname  : hadrian.us.oracle.com
Workbench Port Number               : 7778
OAM Remote User Http Header Name    : OAM_REMOTE_USER
OAM Web Server Hostname             : hadrian
OAM Web Server Port                 : 7779
Webgate Logout URL                  : /oamsso/logout.html?end_url=/atg/bcc?dummy=1

After completing the CIM configuration, the following files should have been added or modified:

/app/oracle/product/atg/ATG/home/servers/atg_pub/localconfig
     /atg/dynamo/servlet/dafpipeline
          OamRemoteUserServlet.properties
          DynamoHandler.properties
          AccessControlServlet.properties
     /atg/dynamo/servlet/pipeline
          RedirectURLValidator.properties
     /atg/endeca
          ApplicationConfiguration.properties
     /atg/remote/controlcenter/service
          ControlCenterService.properties
     /atg/userprofiling
          InternalProfileFormHandler.properties
     /atg/userprofiling/oam
          Configuration.properties
          NonTransientLogoutAccessController.properties
     /atg/web/assetmanager/userprofiling
          NonTransientAccessController.properties
/app/oracle/product/atg/ATG/home/servers/atg_prod/localconfig
     /atg/endeca
          ApplicationConfiguration.properties

If your environment is already configured and you don’t want to use CIM, it’s probably best to run CIM in a different environment to get the modified files, then run diff checks against the above files to determine what changes need to be applied, and manually apply them.

Once the above procedure is complete, we’ll have all we need to accept OAM authenticated users from the ATG BCC, but we still need to create an OAM policy to secure the application and redirect to the OAM SSO Login Page when unauthenticated requests to port 7779 are made. We’ll get into that after first discussing the additional changes necessary for Endeca OAM integration.

For additional information on the ATG OAM integration, you can also refer to the following:

Using Oracle Access Management for Single Sign On
Installing the Oracle Access Manager Integration Component

Enabling OAM for Endeca Workbench/XM

Like ATG, Endeca also needs to be configured accept OAM authenticated users, but unlike ATG, there is no CIM configuration wizard for Endeca. The changes must be made through manual edits on Endeca configuration files. Outlined below is the procedure that I followed to configure Endeca for OAM SSO.

  1. Edit the webstudio.properties file:
    cd /app/oracle/product/endeca/ToolsAndFrameworks/11.0.0/server/workspace/conf
    vi webstudio.properties
    

    And apply the following changes:

    a) Set useOAM to true, and set logoutURL as follows:

    # OAM Authentication
    com.endeca.webstudio.useOAM=true
    #com.endeca.webstudio.oam.identityAssertionValidation=true
    #com.endeca.webstudio.oam.keyStore=oamkeystore.ks
    #com.endeca.webstudio.oam.keyStoreType=JKS
    #com.endeca.webstudio.oam.keyStorePassword=oampass
    com.endeca.webstudio.oam.logoutURL=/ifcr/system/sling/logout.html?oam.logout.url=/oamsso/logout.html%3Fend_url=/ifcr
    

    b) Set useSSO to false, and comment out the Commerce SSO section:

    # Commerce SSO Authentication
    com.endeca.webstudio.useSSO=false
    #com.endeca.webstudio.sso.loginURL=http://localhost:38840/sso/login
    #com.endeca.webstudio.sso.controlURL=http://localhost:38840/sso/control
    #com.endeca.webstudio.sso.logoutURL=http://localhost:38840/sso/logout
    #com.endeca.webstudio.sso.validationURL=http://localhost:38840/sso/validate
    #com.endeca.webstudio.sso.keepAliveURL=http://localhost:38840/sso/keepAlive
    #com.endeca.webstudio.sso.keepAliveFrequency=1800
    
  2. In the same directory, edit the ws-extensions.xml file, and change the url attributes to point to the OHS virtual host for the ATG BCC:
    <extension id="bcc-home" defaultName="BCC" defaultDescription ="BCC"
      url="http://hadrian:7779/atg/bcc"
        externalURL="true"/>
        <extension id="bcc-access-control"
          defaultName="BCC Access Control"
          defaultDescription="BCC Access Control"
          role="admin"
          url="http://hadrian:7779/ControlCenter/application/accesscontrol"
          externalURL="true"/>
    </extensions>
    
  3. Edit the file Login.conf, and uncomment the Webstudio section. Then modify the serverInfo, serviceUsername, and servicePassword properties to point to the same LDAP repository that OAM checks against. In my configuration, that was the WebLogic Server Embedded LDAP. An example of my file is shown below:
    Webstudio {
        com.endeca.workbench.authentication.ldap.WorkbenchLdapLoginModule required
        serverInfo="ldap://hadrian:7001"
        serviceUsername="cn=Admin"
        servicePassword="welcome1"
        serviceAuthentication="simple"
        authentication="simple"
        useSSL="false" keyStoreLocation="/app/oracle/product/endeca/ToolsAndFrameworks/11.0.0/server/workspace/conf/webstudio.jks"
        keyStorePassphrase="keypass"
    
        // The query used to look up a user in the LDAP directory and
        // templates that extract information from the user object
        userPath="/ou=people,ou=myrealm,dc=idm_domain??sub?(&(objectClass=person)(uid=%{#username}))"
        userTemplate="%{#uid}"
        firstNameTemplate="%{#givenName}"
        lastNameTemplate="%{#sn}"
        emailTemplate="%{#mail}"
    
        // The query used to look up a group in the LDAP directory and
        // templates that extract information from the group object
        findGroupPath="/ou=groups,ou=myrealm,dc=idm_domain??sub?(&(objectClass=group)(cn=%{#groupname}))"
        findGroupTemplate="%{#dn:0}"
        groupEmailTemplate="%{#mail}"
    
        // The query and template used to fetch the groups associated
        // with a user when the user logs in to Web Studio
        groupPath="/ou=groups,ou=myrealm,dc=idm_domain??sub?(member=%{#dn})"
        groupTemplate="%{#dn:0}"
    ;
    };
    
  4. The servicePassword specified in step 3 above is not the password for the weblogic or admin user, but rather the password for the WebLogic Embedded LDAP admin user (“cn=Admin“). The password you specify in the servicePassword field must match with the Embedded LDAP password configured in WebLogic Server.
    To set this password, log into the WebLogic Admin Console for the OAM environment (http://hadrian:7001/console) and navigate to “idm_domain ▶ Security ▶ Embedded LDAP”, and change the value of the credential field to use the same password. Then save and apply your changes.
    commerce+oam_wls_sec_realm
  5. While you’re in the WebLogic Admin Console, add a user named “admin”. Navigate to “Security Realms ▶ myrealm ▶ Users and Groups ▶ Users”, and click the “New” button to add a new user named admin. Don’t worry about assigning groups to this user because authorization will be done in the BCC and in Workbench.
    commerce+oam_wls_admin_user
  6. Log into the Endeca Workbench. Navigate to User Management, click on the admin user and change the Source from Workbench to LDAP:
    commerce+oam_wb_admin_user

New users created must be synchronized between WebLogic Server, Workbench, and the ATG Profile Repository. That is, if you create a new user in the WebLogic security realm, and you want that user to be able to log into Workbench, as well as log into the ATG BCC, then you must also create that user in Workbench, as well as in the ATG Profile Repository.

The procedure outlined here is sufficient if Workbench and OAM are both behind the same firewall, but if that’s not the case, you may want to enable trust between the two systems by enabling Identity Assertion Validation.

More information on IA validation and configuring Endeca with OAM can be found in:

Oracle Endeca Commerce Administrator’s Guide

Configuring OAM Security Policies for BCC and Workbench/XM

The final steps of this process are to configure the security policies that protect the BCC and Workbench/XM. These applications already have their own individual login pages, but this procedure will allow them to use a single, shared, login page, that bypasses the application specific logins and only requires users to sign in once to access both applications.

All security policy configuration is done using the Oracle Access Management Console, so start by opening a browser and logging into the OAM console (http://hadrian:7001/oamconsole).

  1. Register a WebGate agent:
    a) From the Launchpad, under Quick Start Wizards, select “SSO Agent Registration”.
    b) As the type, select “11g Webgate” and click Next.
    c) Enter “OHS1_WebGate” as the name, and “hadrian.us.oracle.com” as the preferred host.
    commerce+oam_oamconsole_wgagent

    d) Click Finish.

  2. Create Host Identifiers for the ATG and Endeca servers:
    a) From the Launchpad, under Access Manager, select “Host Identifiers”.
    b) Click the “Create Host Identifier” button.
    c) Use name “endeca_wb1″, and “hadrian.us.oracle.com:7778″ for host and port, then click Apply.
    commerce+oam_oamconsole_hostid

    d) Repeat for ATG server, using:

    Name: atg_pub1
    Description: Host identifier for ATG Publishing
    Host Name: hadrian.us.oracle.com
    Port: 7779
  3. Create an Application Domain:
    a) From the Launchpad, under Access Manager, select “Application Domains”.
    b) Click the “Create Application Domain” button. Enter the following, and click Apply.
    Name: ATG/Endeca
    Description: Policy objects enabling integration with ATG and Endeca.
  4. Create an Authentication Policy:
    a) On the newly created application domain, select the “Authentication Policies” tab.
    b) Click the “Create Authentication Policy” button.
    c) Name the policy “Endeca” and select “LDAPScheme” for authentication. Then click Apply.
    commerce+oam_oamconsole_atnpolicy

    d) Click the “Duplicate” button to create another similar policy, this time using the following:

    Name: ATG
    Description: ATG SSO using LDAP authentication.
  5. Create an Authorization Policy:
    a) Return to the ATG/Endeca application domain, and select the “Authorization Policies” tab.
    b) Click the “Create Authorization Policy” button, and name the policy “open”.
    c) Select the “Conditions” tab, and add a condition of type True (name will auto-populate).
    d) Select the “Rules” tab, and in the “Allow Rule” section, move the TRUE condition from Available to Selected, and click Apply.
    commerce+oam_oamconsole_rules
  6. Define the Resource URLs to protect:
    a) Return to the ATG/Endeca application domain, and select the “Resources” tab.
    b) Click the “New Resource” button, and create a new resource of type HTTP with:
    Host Identifier: endeca_wb1
    Resource URL: /**
    Protection Level: Protected
    Authentication Policy: Endeca
    Authorization Policy: open
    commerce+oam_oamconsole_resources

    c) Click the “Duplicate” button to create another similar resource, this time using the following:

    Host Identifier: atg_pub1
    Resource URL: /**
    Protection Level: Protected
    Authentication Policy: ATG
    Authorization Policy: open
  7. Conclusion

    Once you’ve completed the procedures above, and have restarted all of your servers, your environment will be SSO enabled using Oracle Access Manager. Your users should use the frontend host addresses to log into the BCC and XM and will be able to toggle between the two consoles without having to re-authenticate.

    To verify your configuration, try logging into either of the following two URLs (hostname may differ):
    http://hadrian:7778/ifcr
    http://hadrian:7779/atg/bcc

    If everything is configured correctly, you will be presented with the Oracle Access Manager Login page:

    commerce+oam_oam_loginpage

    Log in using the password specified when creating the admin user in the WebLogic Admin Console (welcome1), and you should proceed to either the BCC or XM management console.

    From the BCC, under “Oracle Commerce Tools” section, you should see a link for the Endeca Workbench:

    commerce+oam_bcchome

    Clicking on the link will navigate to the Endeca Workbench without requiring you to authenticate again. From the Endeca Workbench/XM, you should also see a link that can take you back to the ATG BCC:

    commerce+oam_xmhome

    Hopefully this article provides the necessary insight to administrators, architects, and developers, who are looking to integrate OAM into their Commerce solutions. Please keep in mind that the procedures outlined herein are intended for development environments, and that additional steps may be required for production environments. Please make sure to also review the product literature when configuring your environments.

Debugging CAS with the Endeca RecordStore Inspector

Previous: Configuring OAM SSO for ATG BCC and Endeca XM
$
0
0

Introduction

When it comes to debugging Endeca Content Acquisition System (CAS) related issues, there are few tools that Endeca developers have at their disposal to aid them in their troubleshooting. Most know to review the CAS service logs, but occasionally an issue arises where a peek inside a record store can be very revealing. If you’re a savvy Endeca developer you already know that you can export record store content using the “recordstore-cmd” command. Combined with a good text editor this command can be very useful, but CLI’s can be a bit tedious to work with at times, so when I recently ran into such an issue I decided to write my own visual tool for inspecting Endeca Record Stores, which I aptly named the Endeca RecordStore Inspector. In this article, I introduce the Endeca RecordStore Inspector utility and show how it can be used to debug Endeca CAS related issues.

Main Article

I was recently assisting with a CAS issue where the CAS Console was reporting failed records for an incremental crawl of a Record Store Merger where one of the data sources contained deleted records. The origin record store was reporting the deleted records correctly, but when the last-mile-crawl merger was run it would report the deleted records as failed records.

For each failed record, I could see the following messages in cas-service.log:

WARN [cas] [cas-B2BDemo-last-mile-crawl-worker-1] com.endeca.itl.executor.ErrorChannelImpl.[B2BDemo-last-mile-crawl]: Record failed in MDEX Output with message: missing record spec record.id

DEBUG [cas] [cas-B2BDemo-last-mile-crawl-worker-1] com.endeca.itl.executor.ErrorChannelImpl.[B2BDemo-last-mile-crawl]: Record failed in MDEX Output (MdexOutputSink-826115787) with Record Content: [Endeca.Action=DELETE]

The messages were not intuitive at first, but I could tell that a different record spec identifier was being used, so to see what was going into the record stores I decided to create a tool for visualizing the contents of a record store in a familiar tabular format. Using this tool, I could see that the records contained both an “Endeca.Id” property as well as the “record.id” property:

However, when one of the source files was removed and the acquisition re-run, the new generation contained the delete records with only the “Endeca.Id” property:

So when the last mile record store merger was run, it didn’t know how to merge the delete records because the record spec identifier (as well as the Record ID property on the data sources) had been changed to “record.id”, thereby producing the above warning message (“missing record spec record.id“) for the DELETE action entries.

Of course the same diagnosis could have been made using recordstore-cmd and a text editor, but some things are easier done in a GUI. For example, sorting records by a specific column type. The Record Store Inspector allows you to sort on any column, as well as filter which columns are visible using regular expression syntax. You can open two different generations of a record store and compare them side-by-side. You can even export the contents of a record store (with or without filters applied) to a comma-separated value (CSV) text file, Microsoft Excel file, or Endeca Record XML file. These sorts of operations are more difficult to do when using just recordstore-cmd and a text editor, and my goal in creating this tool was to make the Endeca community more productive in their ability to diagnose CAS related issues on their own.

About the Endeca RecordStore Inspector

The Endeca RecordStore Inspector utility was written using JavaFX 8 and Endeca 11.1, and runs on Windows, Linux, or any environment that supports the Java 8 runtime. Below I’ve provided download links for two versions; a portable version optimized for Windows, and a self-contained Java jar file for all other environments. To run the Windows version, simply extract the contents of the attached archive and double-click on the rs_inspector.exe file. This version includes a Java 8 runtime environment, so there is no need to install Java or Endeca to run it. To run the self-contained jar file you will need a Java 8 Runtime Environment installed. If one is already present, just copy the attached file below and run the command: “java -jar rs_inspector-1.0-all.jar” to launch the application.

When the application has started you can press CTRL-R to select the record store and generation that you want to view. If your CAS Server runs on a host/port other than localhost:8500, then you can use ALT-S to change the default settings.

Once the record store loads, you can further refine the view using Java regular expression syntax in the column and value fields. This will restrict the columns and rows visible to only records matching the regular expression syntax specified. For example, to view only the columns for “product.id” and “product.short_desc” you can specify a Column Text filter of “product\.id|product\.short_desc” and click the Apply Filter button. To further refine the view to show only products with an id value greater than 3000000, you can use a Value Text filter of “[3-9][0-9]{6}[0-9]*|[1-9][0-9]{7}[0-9]*”.

It is important to point out that this tool currently loads all rows in the selected record store into the table view, so depending on how large the record store is, if your JVM doesn’t have sufficient memory to store all the data, you will receive an OutOfMemoryError. If your environment has sufficient memory to store the entire record store in memory, then you can increase the JVM memory settings (-Xmx) to support your record store. If your record store is larger than 2GB, you should run the RecordStore Inspector in a 64-bit JVM. If the memory in your environment is limited, then you may not be able to load your record store using this tool. Perhaps later versions will offer the ability to incrementally load data into the view. If this is important to you, then please let me know.

Summary

Endeca content acquisition can be somewhat of a black box. To provide some transparency to this process I’ve created the Endeca RecordStore Inspector. The Endeca RecordStore Inspector is a visual tool intended to aid in the debugging of issues pertaining to Endeca CAS data ingestion. In this article we’ve seen one example of how this tool was used to make sense of a seemingly enigmatic error message, but the applications of this tool are much broader in scope, not only in its use as debugging aid, but as a medium for understanding Endeca CAS in general.

Below are links to download the Endeca RecordStore Inspector. Please note that this tool is provided “as-is”, without guarantee or warranty of any kind. It is not part of the Endeca or Oracle Commerce product suite, and therefore not supported by Oracle. However, a link to the complete source code is provided below, and you are free to fix any issues or enhance the tool in any way you like.

Download the portable Windows version

Download the self-contained jar version

The source code and latest version of this utility is maintained on GitHub:
https://github.com/dprantzalos/Endeca-RecordStore-Inspector

If you find this tool useful, please let me know.

Viewing all 10 articles
Browse latest View live


Latest Images