|
Also, I guess the retrieval of the change data really really should use GET, not POST.
David, we had thought about the idea of allowing input scopes to include folders, but decided to keep it simpler for the providers. The client could to do the scoping by using getObjectParents. I agree it would be easier on the clients if the entries already had the path(s) but we did not talk about including the paths in the entries and I would be curious in the opinions of the ecm providers on whether that would be something that could be done with decent performance. But for the client to index based on scope would mean the client has to then also worrry about whether the filing operations would be treated as content changes (which is stated in the proposal to be a repository choice).
Julian, I was just blindly borrowing the pattern from the query service of posting to a collection, but a GET in this case makes more sense.
There are a few unified search areas which I think would benefit from improved explanation:
- how a client might 'bootstrap' itself to bring itself up to the latest change identifier, especially if the repository has changesIncomplete=true. - can a client bootstrap itself as of a certain point in time? - explanation of the behavior options -- for example: - describing that unified search could support both 'changelog-based systems (including changed values)' as well as 'what changed-based systems (including only the identifier, no values)' - describing that unified search may support either aggregation (lossy/condensed changelog) or verbatim (noncondensed changelog) systems. A couple of comments from my colleagues at Oracle:
Contents of the change log The following text seems to indicate that events cannot be omitted from the stream even if later made irrelevant: "The order in which the CmisChangedObjectType instances appear in the output set is the order in which the events happened, oldest first, for each instance of the content described by CmisChangedObjectType. For example, if an item was created at time t, updated at time t+1 and then deleted at time t+2, the order in which the events appear in the output set is create at t, update at t+1, delete at t+2, though these events would not necessarily be grouped together in a page of responses or even in the same response page. This is done so that the service consumer can process the events in the order they appear in the result set without having to remember what events it has already processed for an object." I think we should relax this and allow repository optimizations such as the following: - If later in time object x is deleted, only that deletion needs to be reported. Creation and update can be omitted (we still need to keep the delete even for object created after the changeToken because an initial full crawl has a view of the content that does not exactly correspond to that changeToken (it may have captured changes that happened between the start and end of the crawl); - When there is a sequence of creation and updates on object x, only the last update needs to be represented. This may look a bit awkward since the crawler may have to treat an update as a creation in its index, but this can save quite a few updates to the index for frequently changed items. Actually it seems that making a distinction between creation and update is not strictly necessary for the crawler. ------------------ REST binding In the sample response, the single entry has the 3 types of cmis:changedObject. I assume that this is just to demonstrate those 3 possible forms, but that a single entry could contain only one of those. I think the sample would be much more informative if it contained one entry of each type. I would especially like to see what is the very minimum set of information that needs to be included on a deleted item. It seems strange that each entry in this collection is not a 'normal' entry as you would find it in the other collection. I would expect the <cmis:properties> tag to remain a child of entry. cmis:changedObject should contain only 'new' information relevant to the notion of change (so only the type of change and time of change). Since both of those properties are simple and short, and strongly typed I would expect them to be represented as attributes of cmis:changedObject. |
|||||||||||||||||||||||||||||||||||||||||
Has there been any consideration for allowing a client to scope the returned change set to a specific folder (including sub folders)? I think there are use cases where the search engine may not be required to crawl all content, or multiple search engines may be setup to crawl a single repository (each indexing a separate part).
This might also support other use cases outside of search indexing e.g. implementation of a poor mans (pull based) change event queue.
Does each document (and folder) in the returned change set include its folder path? This can at least allow a client to filter the change set by folder.