Documentation/Labs/VTK-String

Home < Documentation < Labs < VTK-String

Problem

Currently, encoding in VTK strings is not explicitly specified. When receiving a string from external libraries or using the string in operating system calls (e.g., reading/writing files) then the behavior is often incorrect.

For example:

files that have non-ASCII characters in their name cannot be opened
when changing the application locale (so that some necessary special characters can be stored in a single byte), then generated files may become invalid (e.g., because decimal point is replaced by decimal comma)
Python and Qt stores strings with known encoding, but there is no way to convert them to/from strings in VTK without loss of information

There is a vtkUnicodeString class in VTK that you store string with a known encoding. It can store and provide string in utf8 and utf16 encoding. It is already used extensively in text rendering, arrays, tables, certain file export/import, but majority of VTK still uses const char* and get/set macros.

Using const char* for string storage, managing memory with vtkSetStringMacro/vtkGetStringMacro, and process strings with C string functions are all very outdated programming practices. VTK-based applications must choose between following VTK's approach and stuck with outdated practices or break away from it and live with inconsistencies in the code base and be cautious with managing strings (additional conversions, null-pointer checks are needed) - none of these options are good.

See some more discussion here: https://discourse.vtk.org/t/proposal-should-we-replace-vtkstdstring-with-std-string/796/14

Proposal

Design

Use an encoding-aware string class in VTK to store all strings.

vtkUnicodeString is a good starting point, as it can store any string with known encoding and it is already in VTK, used in a number of VTK classes.

It could be renamed to vtkString to make the name shorter. It is also more clear if we don't include the name of a particular encoding in the class name (as in the future we might support multiple encodings inside the string class, not necessarily just Unicode). This would also consistent with how other libraries manage strings (see for example Qt's QString).

Plan

Rename vtkUnicodeString by vtkString. Maybe improve the API with adding get as/set from Latin1.
Replace all string attributes in VTK classes by vtkString (add new get/set macros, create object instance in the constructor)
Update Python wrapping
Maybe add automatic converters to const char* (can be disabled by CMake flags) to make update of application code easier
Review all operating system calls and make sure strings are properly converted

Documentation/Labs/VTK-String

Contents

Problem

Proposal

Design

Plan

Navigation menu

Views

Personal tools

About slicer

Publication

Documentation

Help

Links

Search

Tools