Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Alexander Klimetschek <alexander.klimetschek <at> mindquarry.com>
Subject: Re: Broken caching of servlet: source in some cases
Newsgroups: gmane.text.xml.cocoon.devel
Date: Saturday 21st April 2007 20:54:51 UTC (over 10 years ago)
Grzegorz Kossakowski schrieb:
>> I've talked with Reinhard about this issue and we agreed that old
Last-Modified is needed when validity is returned. We'll have to maintain
>> our own store with these values.
>> I'm working on the definitive solution to this problem.
>>
> 
> Done. Could you give it a try?
> Changes are described in comments of source code and svn log.
> 

Great! I can try it on Monday, have to do other things this weekend.

But I found out two other problems with the order of methods called on the 
source.

1) A (Resource)Reader will always call getLastModified() before 
getValidity(), which breaks the caching completely, since it starts a 
servlet connection without the If-Modified-Since set. But it looks like
this 
could be fixed with your new changes! My first idea was to return -1 in 
getLastModified() until the real value is known after the connection was 
executed. But I am not sure if this will break other use cases.

(BTW: The method ServletConnection.connect() should be renamed to call() or

execute() - connect sounds like doing only the first step, "establishing a 
connection", but it actually connects, gets the data and "closes" the 
connection!)

2) The other problem happens when the validity will be integrated inside an

AggregatedValidity together with others, eg. when using . 
In that case it is possible that although the source validity returns valid

(and has no response data), the pipeline calls getInputStream(). This is 
when the other validities are invalid and the decision is made to retrieve 
fresh new data from all sources. That was the mysterious last bug ;-)

For this I would propose to change the getInputStream() implementation that

it will do a connection without if-modified-since header set regardless if 
there already was a connection (started from isValid method). This will end

in two full sitemap processings, but there seems no other solution to me.

Alex


PS:

I evaluated the entire caching algorithms in Cocoon during debugging and 
here are all the important bits and pieces I came up with from the point of

a Source developer. Some is noted on 
http://cocoon.apache.org/2.1/userdocs/concepts/caching.html
but not 
everything, so I'd like to share it on the list for future work:

Sources & Caching in Cocoon
===========================

This is typical order of org.apache.excalibur.source.Source and 
SourceValidity methods called regarding caching:

getURI()  <- used as cache key for the cached response + the cached
validity

getLastModified()  <- called only by ResourceReader to set the
Last-Modified
                       header if the value is > 0

SourceValidity.isValid()  <- called on cached (old) validity if found
                              in cache

getValidity()  <- called if the old cached validity returned 0 (UNKNOWN) on
                   isValid() or for putting the new data into the cache

SourceValidity.isValid(SourceValidity)
                <- called on cached validity with the new validity as
                   parameter

getInputStream()  <- called when any isValid() method returned -a (INVALID)
                      but also when some other information outside the
                      current source forces new data to be fetched (eg.
                      when SourceValidity is put into an AggregatedValidity
                      together with others - one invalid validity makes all
                      sources invalid!)

If the isValid(SourceValidity) method returns UNKNOWN, the new validity
will 
be refetched, so getValidity() is called a second time (!).


-- 
Alexander Klimetschek
http://www.mindquarry.com
 
CD: 3ms