Class String.

Inherits Garbage

A generic 8-bit string class.

The string data are counted, so null bytes are allowed, and most operations are very fast.

The data structure uses a simplifed variant of reference counting, where only "one" and "many" are possible. The detach() function ensures that the count is "one" afterwards. Many functions leave the count on "many", even ones such as mid().

Only a small set of core operations are so far implemented, including length(), cstr() (to convert to const char *) and data() (to get at the raw data).

String::String( const String & s )

Creates a copy of s.

String::String( const char * s )

Creates a String from the NUL-terminated string s. The NUL is not copied.

String::String( const char * s, uint n )

Creates a String from the first n bytes of s, which may contain NULs.

String::String()

Creates an empty String

Reimplements Garbage::Garbage().

String String::anonymised() const

Returns a copy of this string where most/all content has been replaced with the letter 'x' or the digit '4', but if the message was an RFC 822 message, it keeps the same parse tree.

Specifically, most ASCII words are changed to xxxx, while most/all syntax elements are kept.

This function is very, very slow. That's okay since it's only used for sending bug reports to us, and we all know, that's not a common case.

void String::append( char c )

This version of append() appends the single character c.

void String::append( const String & other )

Appends other to this string.

void String::append( const char * s )

This version of append() appends the null-terminated string s, or does nothing if s is null.

void String::append( const char * base, uint num )

This version of append() appends num raw bytes from memory base. If base is null, this function does nothing.

bool String::boring( Boring b ) const

Returns true if this string is really boring, and false if it's empty or contains at least one character that may warrant quoting in some context. So far RFC 822 atoms, 2822 atoms, IMAP atoms and MIME tokens are considered.

This function considers the intersection of those character classes to be the Totally boring subset. If b is not its default value, it may include other characters.

uint String::capacity() const

Returns the capacity of the string variable, that is, how long the string can be before it has to allocate memory.

int String::compare( const String & other ) const

Returns -1 if this string is lexicographically before other, 0 if they are the same, and 1 if this string is lexicographically after other.

The comparison is case sensitive - just a byte comparison.

bool String::contains( const String & s ) const

Returns true if this string contains at least one instance of s.

bool String::contains( const char c ) const

Returns true if this string contains at least one instance of c.

bool String::containsWord( const String & s ) const

Returns true if this string contains at least one instance of s, and the characters before and after the occurence aren't letters.

String String::crlf() const

Returns a copy of this string where every linefeed is CRLF, and where the last two characters are CRLF.

const char * String::cstr()

Returns the zero-terminated byte representation of the string. Note that even though the return value is zero-terminated, it can also contain null bytes in the middle.

Even though this function modifies memory, it doesn't detach(), since it doesn't modify the string. However, in most cases its call to reserve() causes a detach().

const char * String::cstr() const

This const version of cstr() is the same as the non-const version above. The only difference is that it can be called on a const object, and that it may cause some memory allocation elsewhere.

const char * String::data() const

Returns a pointer to the string's byte representation, which is NOT necessarily zero-terminated.

String String::de64() const

Decodes this string using the base-64 algorithm and returns the result.

String String::deQP( bool underscore ) const

Decodes this string according to the quoted-printable algorithm, and returns the result. Errors are overlooked, to cope with all the mail-munging brokenware in the great big world.

If underscore is true, underscores in the input are translated into spaces (as specified in RFC 2047).

String String::deURI() const

Returns a version of this String with every %xx escape replaced with the corresponding character (as used to encode URIs). Invalid escape sequences are left unchanged, so this function cannot be used for input from potentially malevolent sources.

String String::deUue() const

An implementation of uudecode, sufficient to handle some occurences of "content-transfer-encoding: x-uuencode" seen. Possibly not correct according to POSIX 1003.2b, who knows.

String String::decode( Encoding e ) const

Returns a e decoded version of this String.

void String::detach()

Ensures that the string is modifiable. All String functions call this prior to modifying the string.

String String::e64( uint lineLength ) const

Encodes this string using the base-64 algorithm and returns the result in lines of at most lineLength characters. If lineLength is not supplied, e64() returns a single line devoid of whitespace.

String String::eQP( bool underscore ) const

Encodes this string using the quoted-printable algorithm and returns the encoded version. In the encoded version, all line feeds are CRLF, and soft line feeds are positioned so that the q-p looks as good as it can.

Note that this function is slightly incompatible with RFC 2646: It encodes trailing spaces, as suggested in RFC 2045, but RFC 2646 suggest that if trailing spaces are the only reason to q-p, then the message should not be encoded.

If underscore is present and true, this function uses the variant of q-p specified by RFC 2047, where a space is encoded as an underscore and a few more characters need to be encoded.

String String::eURI() const

Returns a version of this String with absolutely nothing changed. (This function is eventually intended to percent-escape URIs, the opposite of deURI().)

String String::encode( Encoding e, uint n ) const

Returns an e encoded version of this String. If e is Base64, then n specifies the maximum line length. The default is 0, i.e. no limit.

This function does not support Uuencode. If e is Uuencode, it returns the input string.

bool String::endsWith( const String & suffix ) const

Returns true if this string ends with suffix, and false if it does not.

bool String::endsWith( const char * suffix ) const

Returns true if this string ends with suffix, and false if it does not.

int String::find( char c, int i ) const

Returns the position of the first occurence of c on or after i in this string, or -1 if there is none.

int String::find( const String & s, int i ) const

Returns the position of the first occurence of s on or after i in this string, or -1 if there is none.

String String::fromNumber( int64 n, uint base )

Returns a string representing the number n in the base system, which is 10 (decimal) by default and must be in the range 2-36.

For 0, "0" is returned.

For bases 11-36, lower-case letters are used for the digits beyond 9.

String String::headerCased() const

Returns a copy of this string where all letters have been changed to conform to typical mail header practice: Letters following digits and other letters are lower-cased. Other letters are upper-cased (notably including the very first character).

String String::hex() const

Returns the lowercase-hexadecimal representation of the string.

String String::humanNumber( int64 n )

Returns n as a string representing that number in a human-readable fashion optionally suffixed by K, M, G or T.

The number is rounded more or less correctly.

bool String::isQuoted( char c, char q ) const

Returns true is the string is quoted with c (default '"') as quote character and q (default '\') as escape character. c and q may be the same.

uint String::length() const

Returns the length of the string. The length does not include any terminator or padding.

String String::lower() const

Returns a copy of this string where all upper-case letters (A-Z - this is ASCII only) have been changed to lower case.

String String::mid( uint start, uint num ) const

Returns a string containing the data starting at position start of this string, extending for num bytes. num may be left out, in which case the rest of the string is returned.

If start is too large, an empty string is returned.

bool String::needsQP() const

This function returns true if the string would need to be encoded using quoted-printable. It is a greatly simplified copy of eQP(), with the changes made necessary by RFC 2646.

uint String::number( bool * ok, uint base ) const

Returns the number encoded by this string, and sets *ok to true if that number is valid, or to false if the number is invalid. By default the number is encoded in base 10, if base is specified that base is used. base must be at least 2 and at most 36.

If the number is invalid (e.g. negative), the return value is undefined.

If ok is a null pointer, it is not modified.

String & String::operator=( const String & other )

Copies other to this string and returns a reference to this string.

String & String::operator=( const char * s )

Copies s to this string and returns a reference to this string. If s is a null pointer, the result is an empty string.

void String::operatordelete( void * p )

Deletes p. (This function exists only so that gcc -O3 doesn't decide that String objects don't need destruction.)

void String::print() const

This function is a debugging aid. It prints the contents of the string within single quotes followed by a trailing newline to stderr.

String String::quoted( char c, char q ) const

Returns a version of this string quited with c, and where any occurences of c or q are escaped with q.

void String::reserve( uint num )

Ensures that there is at least num bytes available in this string. This implicitly causes the string to become modifiable and have a nonzero number of available bytes.

After calling reserve(), capacity() is at least as large as num, while length() has not changed.

void String::reserve2( uint num )

Equivalent to reserve(). reserve( num ) calls this function to do the heavy lifting. This function is not inline, while reserve() is, and calls to this function should be interesting wrt. memory allocation statistics.

Noone except reserve() should call reserve2().

String String::section( const String & s, uint n )

Returns section n of this string, where a section is defined as a run of sequences separated by s. If s is the empty string or n is 0, section() returns this entire string. If this string contains fewer instances of s than n (ie. section n is after the end of the string), section returns an empty string.

void String::setLength( uint l )

Ensures that the string's length is l. If l is 0, the string will be empty after the function is called. If l is longer than the string used to be, the new part is uninitialised.

String String::simplified() const

Returns a copy of this string where each run of whitespace is compressed to a single ASCII 32, and where leading and trailing whitespace is removed altogether.

bool String::startsWith( const String & prefix ) const

Returns true if this string starts with prefix, and false if it does not.

bool String::startsWith( const char * prefix ) const

Returns true if this string starts with prefix, and false if it does not.

String String::stripCRLF() const

Returns a copy of this String with at most one trailing LF or CRLF removed. If there's more than one LF or CRLF, the remainder are left.

String String::stripWSP() const

Returns a copy of this string where leading and trailing spaces and tabs have been removed.

Despite its name, this function does NOT strip white space in general. Any internal white space is copied verbatim. CR and LF are not treated as white space.

void String::truncate( uint l )

Ensures that the string's length is either l or length(), whichever is smaller. If l is 0 (the default), the string will be empty after the function is called.

String String::unquoted( char c, char q ) const

Returns the unquoted representation of the string if it isQuoted() and the string itself else.

c at the start and end are removed; any occurence of c within the string is left alone; an occurence of q followed by c is converted into just c.

String String::upper() const

Returns a copy of this string where all lower-case letters (a-z - this is ASCII only) have been changed to upper case.

String String::wrapped( uint linelength, const String & firstPrefix, const String & otherPrefix, bool spaceAtEOL )

Returns a copy of this string wrapped so that each line contains at most linelength characters. The first line is prefixed by firstPrefix, subsequent lines by otherPrefix. If spaceAtEOL is true, each line except the last end with a space.

The prefixes are counted towards line length, but the optional trailing space is not.

Only space (ASCII 32) is considered as line-break opportunity. Linefeeds added use CRLF.

String::~String()

Destroys the string.

Because String is used so much, and can eat up such vast amounts of memory so quickly, this destructor does something: If the string is the sole owner of its data, it frees them.

As of April 2005, the return values of data() or cstr() are NO LONGER valid after a string has gone out of scope or otherwise been lost.

This web page based on source code belonging to Oryx Mail Systems GmbH. All rights reserved.