logo

New Response

« Return to the blog entry

You are replying to:

  1. Changing selection formulae programmatically is a good solution if there is just one view (or a small number of views) and if the database is not that big. In the case of my bloggregator database, neither one of those is true. I have three date ranges that I display: one day, two days, and seven days; and I generate those displays for all RSS feeds that I pull in, or for any one RSS feed -- and I have upwards of 200 feeds coming in and a total of more than 120,000 documents in the database at present time.

    I could have gone with three "all" views plus three views per feed, but the massive number of view rebuilds that would occur every night when the selection formulae are updated would scare me even if the database were a lot smaller, so there was no way I was going to go this way.

    In theory, I could use just six views -- three uncategorized, and three categorized by feed, and adjust the selection formulae nightly. Chaanging the selection formulae would result in complete rebuilds of all six views every night, but with only six views this may not be too bad. There are two reasons why I didn't do this, though.

    The first reason I didn't do it is that -- hust as is the case with readernames fields -- single category view displays can get quite slow when there are thousands of documents that have to be skipped in order to locate the ones that are actually displayed, so I decided not to go that way. In my own use, the number of documents that would be selected in the seven day categorized view is only around 5000 at any given time, and that's below the threshold where performance issues tend to set in, so I could probably have used the views and show single category, but my goal was to build a behind-the-firewall re-aggregator that could scale to thousands of feeds, with potentially tens of thousands of current in a seven day span.

    The second reason that I didn't do it is that you can't control the order of view rebuilds. There's no guarantee that the order in which you update the selection formulae is the order in which the rebuilds will occur. Although the worst thing I can think of happening in my application if views were updated in an unusual order is that there could be a significant time lag during which many documents appear in the 1 Day\All view but aren't in the 1 Day views for their particular feed (or vice versa), it still seemed to me that making a major design feature dependent on the order of view rebuilds was a bad idea.

    So, I went with the folder approach. Or, I guess I should say the massively mulitplied folder approach ;-) I set up folders for All\1 Day, All\2 Days, and All\7 Days, and the code also sets up a 1 day, 2 day and 7 day folder for each individual feed that are added. As they are created, new documents get put in the 1, 2 and 7 day folders for their particular feed, and also in the 1, 2, and 7 day "all" folder. A nightly agent traverses the All\7 Days folder and removes documents as needed -- from that folder and also from the others as needed. The performance has been consistently good, the massive number of folders required has (so far) not been a problem, and while the agent takes quite a while to run the performance impact on the server while it is running is barely noticeable.

Your Comments

Name:
E-mail:
(optional)
Website:
(optional)
Comment: