logo

Working With Character Sets and Domino

Introduction

Just when you think you know all there is to know about web development you're brought crashing back down to earth. Until recently I hadn't really paid much attention to nor had any issues with character sets and Domino. Things had always just worked as I'd expected them to. Enter special characters, such as é, and Domino would store them correctly. Then, in response to a blog entry about Cookies, a thread began in which these special characters weren't being saved or displayed properly. I've since been spending a lot of time trying to solve this character-encoding enigma. What I found I think is worth sharing.

About Character Sets

What are character sets? Good question and one I don't feel qualified to answer. You can find everything you (n)ever wanted to know about Character Encoding on Wikipedia.

In layman's term (i.e: my understanding of it) it's like this. There are standard characters, like 0-9, A-Z, a-z along with things like !"£$%^&*() etc. Stuff you see on a normal "English" keyboard and these can be represented using the basic character set called ASCII. If your site only ever needs to use this set of characters you're ok. If, however, you need to cater for other languages that use different characters you need to use another character set. Simple really, isn't it?

But, how do I change the character set used and which one should it be? Well, I'll get to the how later on. As for which, well, that depends. Why not use UTF-8, which covers just about everything. Or you could use ISO-8859-1, which covers Western European languages.

Which to use is a debate all in itself and one that I don't truly "get". Let's forget it for now and look at the issue I had with Domino that made me stop and think about how they are used on this site.

The Scenario

The problem was that certain characters, submitted in response to blog entries, weren't storing properly. They would be sent back to the browser as square blobs like this ■ (the character &#9632, whatever that may be).

Responses to blog entries on this site aren't posted using real Domino forms. Instead they are created with"in-line" forms.

The same was happening for blog entries and articles. In the new version of CodeStore responses to articles are now added in the same way as blog responses - using faked forms. You can see one at the bottom of this article. Until I fixed the issue these documents wouldn't store certain characters either.

The really weird thing was that it worked when adding a response directly using the actual Domino form. With this in mind I went about trying to find out what the difference between the two pages were. Why did one work and not the other?!

In terms of the HTML that made up the form and the fields contained within it, the two forms were almost identical. The only difference I could see was its action parameter. For the in-line form the HTML is:

<form method="post" action="post?CreateDocument&ParentUNID=DOCID">

And, for the "direct" Domino Form, the HTML is:

<form method="post" action="post?OpenForm&ParentUNID=DOCID&Seq=1"
name="_post">

The difference here is the URL by which Domino received the data. The actual Form used by Domino is the same. It's just a different way of creating the same document. There's no reason it should be a problem with charsets.

But, this was irrelevant. Even before Domino received the form's values, the browser has added another difference. Using an HTTP Sniffer I was able to intercept the data being sent to Domino by each form. Sending the words inliné and diréct in the respective forms actually sent inlin%63 and dir%C3%A9ct. It appeared that each form was sending the data using a different encoding. Not only that, but one method seemed to be encoding twice. Something was very wrong. The mystery intensified until I happened to notice something else in the sniffer.

The Problem:

What I had failed to notice for a long time was the glaringly obvious. In the screen-grab below we can see the content-type of each type of "form". The inline form first and then the direct one:

screengrab

Notice how the charset in the content-type headers are different. After some further reading I found that browsers send form data using the same encoding as that of the page in which the form is contained. Our direct form is using a different charset to the document in which our in-line form lives. By adding a fake form to a document we are using a different charset to that used by the Domino forms.

So we know the root of the problem. What about the cause of the problem? Well, by default, Domino uses a different charset for documents in read-mode than it does with documents in edit-mode. As you can see from the screen-grab of the Server Document below — read-mode documents are considered "ouput" and so not encoded as UTF-8. Edit-mode documents are "HTML forms" and so encoded using UTF-8, rather than the default Western charset of ISO-8859-1. Hence the difference in the HTTP sniffer screengrab above.

screengrab

The real under-lying problem is that Domino expects forms to be sent to it using UTF-8 encoding, as per the server setting. It doesn't expect us to create our own forms and so doesn't expect data being sent in any other charset. Sending the form data encoded as ISO-8859-1 is going to cause problems. Hence the square blobs.

The Solution

The solution is obvious. We need to force our "fake" form to send data encoded in the same charset as Domino uses for its Forms — UTF-8 in my case. The easiest way to do this is to add an accept-charset argument to our in-line form, like so:

<form method="post" action="post?CreateDocument&ParentUNID=DOCID" accept-charset="UTF-8">

This solved the problem. Well, in Mozilla at least. Not in Internet Explorer, where it seemed to still use the page's charset as the over-riding setting.

To get this to work in Internet Explorer and Firefox (et al) the quick way you can make changes to the actual Domino Form, over-riding it's character set used. To do this, open the Form's Property Box and find the "Character set" setting on the propeller hat tab. Change this to UTF-8, as below.

screengrab

Domino now uses the same character set whether we are reading or editing with this Form. Problem solved!

However, solving it this way doesn't help with other forms that may have the same problem elsewhere. As an alternative we could change the server setting we talked about earlier, so that all output from the server is in the same character set. This would solve the problem on all future Forms we create using this technique.

Meta or Header?

If you're like me you might be wondering why the charset is being sent to the browser as part of the content-type response header. I've not seen many other servers doing this. Have you? Well, it turns out this is the W3C advised way of doing it. If you prefer to use HTML Meta tags there's a server setting that controls it. Here are the defaults:

screengrab

If you swapped these settings around then you'd see the character set as part of the HTML Head of the page, like so:

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Alternatively you could disable both these settings and add this meta tag yourself as part of the $$HTMLHead field's value. It's a brave person who does this though. I rarely say this but I think it might be better to leave this task to Domino.

As far as CodeStore goes I have made sure Domino uses the same character sets for all output (and input) across the server by editing the server document and restarting the HTTP task. No more blobs round here...

Summary

Character sets are essential in Domino applications where there's a chance that international characters are going to be used. For the most part you can rest assured that Domino will take care of their use well enough. The problem comes about when we think for ourselves and start to send data to the server in ways it wouldn't normally expect.

Even if you never see yourself doing any weirdness like this it's still worth knowing, as a Web Developer, that the issue exists. It's essential to have a full understanding of the behind-the-scenes communication between browser and server.

There's no need to spend forever becoming an expert on all this, but a good primer is the O'Reilly HTTP Pocket Reference. You can digest this pocket-sized book in a few hours and discover as much as you'll ever need to know. Next time your on the computing section of a book shop search it out and add it to your shelf!

Further Reading

Addendum

It's worth noting that there's another problem with Domino. Yep, you heard me right, another problem.

Despite telling it to use UTF-8 for input and output it still insists on using US-ASCII for things like views on $$ViewTemplate forms. Others have had the same problem — here and here.

To make sure every page on codestore uses the same charset I changed the Character Set of all Forms. Nothing's ever easy with Domino is it…

Feedback

  1. Brilliant!

    Congratulations on having solved another one of Domino's quirks! I've never really paid any attention to these settings, but on the other hand, I haven't created that many in-line forms yet either.

    /Peter

      • avatar
      • Jake Howlett
      • Thu 10 Mar 2005

      Re: Brilliant!

      Cheers Peter. Hopefully there's more to learn from this article than how to solve the in-line form problem. It's good to know what's going on back there.

    • avatar
    • Fredrik
    • Thu 10 Mar 2005

    Great article!

    Great article Jake!

    I have done the same discovery as you. I was involved in a large project with goal to support double-byte characters...(for multilanguage support). Sometimes the charset was changed from UTF-8 to US-ASCII...which makes domino a bit unreliable (this happens alot if you use ?ReadForm).

    I have also noticed that if you enter a "foreign" letter/character directly on your form/page (such as ö) the charset is switched back to UTF-8 ... (if I can remember correctly)...

    Changing the charset on all design elements and in the server document seems to be the only way to ensure UTF-8 charset...

  2. How about plain html file names

    Your page about Character Sets in Domino is very worth reading, but unfortunately it didn't solve my character set problem when serving plain html pages on our domino server. Pages that contain special characters (e.g. tést.htm) display correctly on a Windows+Domino test server, but on a Linux+Domino production server they result in page not found errors. Do you see an explanation for this behaviour? And a solution?

      • avatar
      • Jake Howlett
      • Fri 6 May 2005

      Re: How about plain html file names

      Sorry Erwin. Don't the answer to this one. Linux/Domino seems funny about what files it will serve. I had to change my stylesheet as it didn't like it being called global.css. Damino.

  3. Thanks!

    Hi, Jake!

    With the help of this article i've fixed my problem with the special characters from Spanish alphabet : you know ... &aacute;. &eacute;..., &ntilde;, ... etc.

    Thanks for this article and your help!

    You can see the results at http://juanfco.ruiz.name

    Note : a spanish site . ;-) "Forastero en Tierra Extraña" is member of NotesRing - Spanish community of Notes/Domino Developers ( http://www.notesring.es/ )

  4. Special characters in field names?

    Hello,

    I'm not sure, whether someone's reading such late responses to your articles... but in my actual test using Adobe Flex with Domino I ran into a character set problem:

    My problems are field names with a special character (german Umlaute like ä, ö, ü) and how Domino handles them in a http POST createdocument action: It does not work at all... Flex sends the formdata in UTF-8, the Notes form itself is also set to UTF-8 charset, but a new created document does not receive the values for such fields with "Umlaut".

    I took a HTTP sniffer and looked into Domino-generated forms and tried to figure out, how Domino handles such fields:

    In a Domino-generated HTML form the normal fields get tag with <a name="_RefreshFieldName"></a>, where FieldName correspondents to the real fieldname on the notes form. Field with a special character get a tag like <a name="_RefreshKW__4adq74pb3ddil50b3dc_"></a> with a total cryptic fieldname. :-(

    When the Domino-generated webpage sends the data with the createdocument action, then the POST parameter for special character fields is this cryptic fieldname... and this of course works...

    So, how does Domino create this cryptic fieldname, because this one works, when the Domino Server receives the parameter for a createdocument action...?

    I can use this cryptic fieldnames in my Flex Application, but it can not be the solution to look in Domino-generated code, before I code my Flex applications...

    One result for me is of course: Never use special characters in field names... but these are old Notes databases, which I like to web-enable with Flex...

  5. thanks for this post

    while working on an existing app it turned out that the developer had explicitly wrote the character set in a $$htmlhead field somewhere in a subform.

    took a time to find out this, so hopefully the tip will save someone else some time...

    • avatar
    • Richa Sharma
    • Wed 8 Feb 2012

    thanks for this post

    I have read all the abvo post for character set. I have been working on a french application , i am facing this issue mainly in JS alerts ,so could you please suggust which approch is better whether to change character type in form property or to change server setting 'Use UTF-8 for output' to Yes

Your Comments

Name:
E-mail:
(optional)
Website:
(optional)
Comment:


Navigate other articles in the category "Forms"

« Previous Article Next Article »
Domino Name Picker Revisited   None Found

About This Article

Author: Jake Howlett
Category: Forms
Keywords: character set; utf-8; us-ascii; encoding; charset;

Options

Feedback
Print Friendly

Let's Get Social


About This Website

CodeStore is all about web development. Concentrating on Lotus Domino, ASP.NET, Flex, SharePoint and all things internet.

Your host is Jake Howlett who runs his own web development company called Rockall Design and is always on the lookout for new and interesting work to do.

You can find me on Twitter and on Linked In.

Read more about this site »