logo

Documents Created Since Agent Last Ran - An Alternative

When using Notes scheduled agents we have the ability to easily find all the documents in the database that were created and/or modified since the Agent last ran.

 ScreenShot003

Inside the code we then just access a NotesDocumentCollection of all the new documents, like so:

Dim collection as NotesDocumentCollection
Set collection = db.UnprocessedDocuments

All is well, until you want to restrict the documents you process. To do this the Notes Way you add a Document Selection criteria to the Agent, like so:

ScreenShot004

 

So now the collection of documents will be restricted to those built using the Form "post" and those where the field called Approved has a value of "0".

At least that's the way I've always assumed it to work. I always have trouble getting it to work as I'd expect it to though. Recently I cooked-up an alternative that I think is much cleaner and less error-prone.

The Alternative

A newby to Notes looking at the code above might wonder how on earth it works. It's not unless you know about (and notice) the Document Selection part that you'll see how the document selection is being made.

Another, much cleaner (in that somebody reading your code can see what's going on directly from the script), method is to build the document collection ourselves, based on the time the agent last ran.

Here's how:

Set thisAgent = Session.currentAgent
        
If thisAgent.HasRunSinceModified Then 
 query = {Form="post" & @Created>=[}+Cstr(thisAgent.LastRun) +{] 
  & Approved="0"}
Else 
 query = {Form="post" & @Created>=@Adjust(@Now; 0; 0; -1; 0; 0; 0) 
  & Approved="0"} 
End If
 
Set collection = db.Search(query, Nothing, 0)

Et, voila! You can now build the collection exactly as you want it and it's always immediately obvious what's going on to the next person.

Notice how, if the agent has been modified then we need to cater for this as the LastRan property can be used. In my case I just look for all documents in the last 24 hours.

A Little Gotcha

I don't know how the logic behind the Notes Way works but there's a gotcha in this method. If, for whatever reason, the agent takes a long time to run one day then the LastRun property could be, let's say, 10 minutes after it started. So, any documents created in that 10 minute gap won't be included the next time the agent runs.

You can get round this by wrapping the LastRun date with an @Adjust like this:

@Created>=@Adjust([}+Cstr(thisAgent.LastRun) +{];0;0;0;0;0;-30) 

I've taken 30 seconds off. How many you do depends on how long you think it might take.

Puttin It To Use On CodeStore

I first put this technique to use on a "document indexing system" for a client. I then remembered it and used it last week while making a change to the way codestore's backend runs.

Until last week every comment posted on this site triggered an individual email to me. This meant I spent a lot of time deleting email from spammers who often attack in waves of 30 or more messages a time. This distraction cost me time and hence money.

I now have a two-hourly agent that sends me a summary email of unapproved comments (those on blogs older than a week). I then glance over it to look for real names (surprisingly obvious which are, as most my spam comes from the likes of "zyzuaeyuwqhun"). I now delete one email every two hours rather than 30 or more. Doesn't sound like much but it has made a big difference.

The down side is that legit posters on old blog entries have to wait up to 2 hours to see it go live. This happens so rarely that it's not really an issue.

My fight to keep the site CAPTCHA-free continues!

Comments

  1. Would it be more efficient to specify the cutoff date in the db.search instead of "nothing"? You could set it to agent.lastRun - 30 seconds.

    • avatar
    • Flaz
    • Wed 24 Jun 2009 05:09 AM

    The NotesDateTime param in the Search method searches documents created OR modified since the specified date.

    So it's not what Jake was looking for...

    • avatar
    • Jake Howlett
    • Wed 24 Jun 2009 05:12 AM

    I was going to reply and say I don't know because I don't ever use that second option to .search(). Flaz beat me to it though.

    If you're looking for either new or modified docs then I guess it could be the lesser of two evils.

    Jake

  2. Interesting Article about the topic:

    http://www-10.lotus.com/ldd/bpmpblog.nsf/dx/db.search-date-vs.-no-date?opendocument&comments

  3. How funny.

    'Until last week every comment posted on this site triggered an individual email to me. '

    As you will know (because I told you) I've been using your blog template on one of my sites for a while.

    Because of the nature of my business I have administrator open all day on a terminal on Server Console.

    The regular message 'Unable to send mail to Jake Howlett.....Name not found in domino directory' used to drive me mad. - kept promising myself I'd investigate.

    Now I know why!

    • avatar
    • Jorge
    • Wed 24 Jun 2009 07:02 AM

    Why not have a field that acts as a flag to determine whether a post is live or not live on the site? You can then simply have your agent every 2 hours look for posts that are not live (flag=false), rather than having to worry about time stamps.

  4. Hi Jake,

    Good stuff. I never really investigated the .HasRunSinceModified. Very interesting.

    I know it's off topic, but for the spammers and their ways ... I just got hit slowly on a database that I hadn't looked over in a while (3 months). I opened up the reports and found that a spammer had hit the contact form 90,000 times. (took me 2 hours to delete in Notes) Luckily, using a little CSS trickery, the system only sends emails on valid contacts, (that is, those sent by humans.) so the customer never knew it was happening.

    However, I believe we're all going to have to move to captchas eventually. Look on the bright side, you're helping computers read books.

    • avatar
    • Jake Howlett
    • Wed 24 Jun 2009 07:20 AM

    That's what it does Jorge (Approved="0"). Without the time stamp the email would grow and grow until I removed the non-live documents from the database and I wouldn't know if any had been posted since it last ran without comparing to the previous email.

    Matt: "I believe we're all going to have to move to captchas eventually"

    I really don't *want* to believe. It will be a cold day in hell when I add one on here.

    I'd prefer to think something equally as clever will come about before then which prevents spambots and doesn't require captchas. Captchas are evil!

    Jake

  5. Just thinking out loud, but to prevent the use of captchas, couldn't you setup another names.nsf for anyone that wants to make a comment and then authenticate their name?

    I know its more work but that would cut down on spammers.

    Just a thought.

    Great post by the way, I can see a lot of potential use. Thanks!!

    • avatar
    • Rob
    • Wed 24 Jun 2009 11:22 AM

    I don't want no captchas!

    Didn't you have a discussion here some time back about other ways to detect automated SPAM bots? I recall hidden fields that the bots would fill in but humans wouldn't. (I'd like to talk about this more.)

    REGISTRATION

    The solution I'd prefer would be to have to register to post a comment. The registration would validate the email address by sending an email to the person registering that had a link they had to click.

    If you were very paranoid then the page that appears when the validation link is clicked would have captcha. This would require only one captcha per registration. I could live with that.

    I've used this very system on several projects.

    AGENT LAST RUN DATE/TIME

    I guess you're saying that the build-in agent last run date/time is the time when the agent last finished running, not when it started. (I've never paid any attention to that.)

    Instead why not store the current date/time first thing when the agent starts in the agent storage area (I forget what that's called.) (Well, you'd have to fetch the previous value, if any, first. Then store the new start time.)

    This would give you the agent last run start time very accurately. If the agent was "never run" then the variable would not be present. If it is present then that time would be when the agent started last.

    Frankly I avoid doing date-based searches when I can think of any other way. You have the "Approved" flag on the Post form. Does this get set true on every Post eventually? I'm wondering why your first query (Form="Post" & Approved=0) isn't sufficient.

    Or have I just gone off topic about how to do a better version of how to improve on "All new & modified documents"?

    Peace,

    Rob:-]

    • avatar
    • Jake Howlett
    • Wed 24 Jun 2009 03:10 PM

    Rob. You want to have to register to post a comment? What are you, crazy!? Why would you want to see anything other than the easiest option possible?

    The only down-side to the current approach is that anybody can post using anybody else's name.

    What's this "agent storage area" of which you talk?? Tell me more.

    The Approved flag is either 1 or 0 and depends (mainly) on whether the blog to which it's a reply is older than 7 days. If so it's 0. If it's younger than a week (the time it seems to take spammers to be aware of a new entry) then it computes to 1 and things appear live.

    {Form="Post" & Approved=0} would be sufficient to find the posts I need to know about. But, unless I kept on top of them and deleted them, then the email I received would grow and grow with time.

    Jake

    • avatar
    • Rob
    • Wed 24 Jun 2009 04:45 PM

    Well, I'd take registration once over captcha on every comment.

    I forgot you don't require an email address to post, but if you required a user name and email address then that really is a registration. If you kept a list of poster email addresses that you've approved, either passively or actively, then the agent could approve any post with that email address.

    You could take it a step farther and send a validation message for any comments with email addresses that aren't in the white list. You don't even need passwords or user name. Just a thought.

    I don't have access to my Notes documentation here but the agent storage area I'm talking about is a place you can store data that stays with the agent. When you modify the agent the data gets deleted. This is where the agent keeps track of what documents it has processed. In Java and LotusScript you've got to make an agent call on each document that you process and it saves something, likely the UNID, to tell the agent that it's processed that document. (I think it's a target setting in the agent properties called something like "Unprocessed documents". But it's only automatic for @ formula agents.)

    I'll look this up tonight and post something more accurate. I've only used this feature a couple of times.

    Peace,

    Rob:-]

    • avatar
    • Rob
    • Thu 25 Jun 2009 01:25 AM

    Hi Jake,

    I'm looking at the designer help for version 8.5 but this is the same for the last two or three versions at least.

    Look in the AgentContext class for the SavedData method. It returns the agent document which preserves any data between agent runs. It seems like data saved to the profile documents but is cleared if you edit the agent.

    Here's how to get the agent document:

    Java

    Document agentDoc = agentContext.getSavedData();

    LotusScript

    Set notesDocument = notesSession.SavedData

    I think this "agent document" is used to track "unprocessed" documents. These documents are fetched with the Java method getUnprocessed Documents() and the LotusScript method agentContext.UnprocessedDocuments. This status is set to processed on an agent by agent basis by each agent calling, in Java, updateProcessedDoc() and in LotusScript, db.UpdateProcessedDoc.

    Now that I think about it, you could save the agent start time in a profile field. Then the agent would know when it was last started even after its been edited.

    Peace,

    Rob:-]

Your Comments

Name:
E-mail:
(optional)
Website:
(optional)
Comment:


About This Page

Written by Jake Howlett on Wed 24 Jun 2009

Share This Page

# ( ) '

Comments

The most recent comments added:

Skip to the comments or add your own.

You can subscribe to an individual RSS feed of comments on this entry.

Let's Get Social


About This Website

CodeStore is all about web development. Concentrating on Lotus Domino, ASP.NET, Flex, SharePoint and all things internet.

Your host is Jake Howlett who runs his own web development company called Rockall Design and is always on the lookout for new and interesting work to do.

You can find me on Twitter and on Linked In.

Read more about this site »

More Content