diff --git a/docs/manual/developer/filters.html b/docs/manual/developer/filters.html new file mode 100644 index 0000000000..b8ac610025 --- /dev/null +++ b/docs/manual/developer/filters.html @@ -0,0 +1,168 @@ + + + +
+ + +Warning - this is a cut 'n paste job from an email: + <022501c1c529$f63a9550$7f00000a@KOJ>
+ ++There are three basic filter types (each of these is actually broken +down into two categories, but that comes later). + +CONNECTION: Filters of this type are valid for the lifetime of this + connection. + +PROTOCOL: Filters of this type are valid for the lifetime of this + request from the point of view of the client, this means + that the request is valid from the time that the request + is sent until the time that the response is received. + +RESOURCE: Filters of this type are valid for the time that this + content is used to satisfy a request. For simple + requests, this is identical to PROTOCOL, but internal redirects + and sub-requests can change the content without ending + the request. + +It is important to make the distinction between a protocol and a +resource filter. A resource filter is tied to a specific resource, it +may also be tied to header information, but the main binding is to a +resource. If you are writing a filter and you want to know if it is +resource or protocol, the correct question to ask is: "Can this filter +be removed if the request is redirected to a different resource?" If +the answer is yes, then it is a resource filter. If it is no, then it +is most likely a protocol or connection filter. I won't go into +connection filters, because they seem to be well understood. + +With this definition, a few examples might help: +Byterange: We have coded it to be inserted for all +requests, and it is removed if not used. Because this filter is active +at the beginning of all requests, it can not be removed if it is +redirected, so this is a protocol filter. + +http_header: This filter actually writes the headers to the +network. This is obviously a required filter (except in the asis case +which is special and will be dealt with below) and so it is a protocol +filter. + +Deflate: The administrator configures this filter based on +which file has been requested. If we do an internal redirect from an +autoindex page to an index.html page, the deflate filter may be added or +removed based on config, so this is a resource filter. + +The further breakdown of each category into two more filter types is +strictly for ordering. We could remove it, and only allow for one +filter type, but the order would tend to be wrong, and we would need to +hack things to make it work. Currently, the RESOURCE filters only have +one filter type, but that should change. + +How are filters inserted? +This is actually rather simple in theory, but the code is +complex. First of all, it is important that everybody realize that +there are three filter lists for each request, but they are all +concatenated together. So, the first list is r->output_filters, then +r->proto_output_filters, and finally r->connection->output_filters. +These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters +respectively. The problem previously, was that we used a singly linked +list to create the filter stack, and we started from the "correct" +location. This means that if I had a RESOURCE filter on the stack, and +I added a CONNECTION filter, the CONNECTION filter would be ignored. +This should make sense, because we would insert the connection filter at +the top of the c->output_filters list, but the end of r->output_filters +pointed to the filter that used to be at the front of c->output_filters. +This is obviously wrong. The new insertion code uses a doubly linked +list. This has the advantage that we never lose a filter that has been +inserted. Unfortunately, it comes with a separate set of headaches. + +The problem is that we have two different cases were we use subrequests. +The first is to insert more data into a response. The second is to +replace the existing response with an internal redirect. These are two +different cases and need to be treated as such. + +In the first case, we are creating the subrequest from within a handler +or filter. This means that the next filter should be passed to +make_sub_request function, and the last resource filter in the +sub-request will point to the next filter in the main request. This +makes sense, because the sub-request's data needs to flow through the +same set of filters as the main request. A graphical representation +might help: + +Default_handler --> includes_filter --> byterange --> content_length -> +etc + +If the includes filter creates a sub request, then we don't want the +data from that sub-request to go through the includes filter, because it +might not be SSI data. So, the subrequest adds the following: + +Default_handler --> includes_filter -/-> byterange --> content_length -> etc + / +Default_handler --> sub_request_core + +What happens if the subrequest is SSI data? Well, that's easy, the +includes_filter is a resource filter, so it will be added to the sub +request in between the Default_handler and the sub_request_core filter. + +The second case for sub-requests is when one sub-request is going to +become the real request. This happens whenever a sub-request is created +outside of a handler or filter, and NULL is passed as the next filter to +the make_sub_request function. + +In this case, the resource filters no longer make sense for the new +request, because the resource has changed. So, instead of starting from +scratch, we simply point the front of the resource filters for the +sub-request to the front of the protocol filters for the old request. +This means that we won't lose any of the protocol filters, neither will +we try to send this data through a filter that shouldn't see it. + +The problem is that we are using a doubly-linked list for our filter +stacks now. But, you should notice that it is possible for two lists to +intersect in this model. So, you do you handle the previous pointer? +This is a very difficult question to answer, because there is no "right" +answer, either method is equally valid. I looked at why we use the +previous pointer. The only reason for it is to allow for easier +addition of new servers. With that being said, the solution I chose was +to make the previous pointer always stay on the original request. + +This causes some more complex logic, but it works for all cases. My +concern in having it move to the sub-request, is that for the more +common case (where a sub-request is used to add data to a response), the +main filter chain would be wrong. That didn't seem like a good idea to +me. + +asis: +The final topic. :-) Mod_Asis is a bit of a hack, but the +handler needs to remove all filters except for connection filters, and +send the data. If you are using mod_asis, all other bets are off. + +The absolutely last point is that the reason this code was so hard to +get right, was because we had hacked so much to force it to work. I +wrote most of the hacks originally, so I am very much to blame. +However, now that the code is right, I have started to remove some +hacks. Most people should have seen that the reset_filters and +add_required_filters functions are gone. Those inserted protocol level +filters for error conditions, in fact, both functions did the same +thing, one after the other, it was really strange. Because we don't +lose protocol filters for error cases any more, those hacks went away. +The HTTP_HEADER, Content-length, and Byterange filters are all added in +the insert_filters phase, because if they were added earlier, we had +some interesting interactions. Now, those could all be moved to be +inserted with the HTTP_IN, CORE, and CORE_IN filters. That would make +the code easier to follow. ++ + + + + diff --git a/docs/manual/developer/index.html b/docs/manual/developer/index.html index c825f96948..f0555ecb8e 100644 --- a/docs/manual/developer/index.html +++ b/docs/manual/developer/index.html @@ -39,6 +39,8 @@ +