Class EString.

Inherits Garbage

An email-oriented 8-bit string class.

The string data are counted, so null bytes are allowed, and most operations are very fast.

The data structure uses a simplified variant of reference counting, where only "one" and "many" are possible. The detach() function ensures that the count is "one" afterwards. Many functions leave the count on "many", even ones such as mid().

The usual string functions are implemented, along with a variety of email-specific operations such as eQP(), deQP(), needsQP(), e64(). boring() returns true if the string can be used unquoted in e.g. MIME, quoted() quotes it. upper() and lower() have a third sibling, headerCased(). simplified() and trimmed() remove white space in ways email often needs.

EString::EString( const EString & s )

Creates a copy of s.

EString::EString( const char * s )

Creates a EString from the NUL-terminated string s. The NUL is not copied.

EString::EString( const char * s, uint n )

Creates a EString from the first n bytes of s, which may contain NULs.

EString::EString()

Creates an empty EString

Reimplements Garbage::Garbage().

EString EString::anonymised() const

Returns a copy of this string where most/all content has been replaced with the letter 'x' or the digit '4', but if the message was an RFC 822 message, it keeps the same parse tree.

Specifically, most ASCII words are changed to xxxx, while most/all syntax elements are kept.

This function is very, very slow. That's okay since it's only used for sending bug reports to us, and we all know, that's not a common case.

void EString::append( char c )

This version of append() appends the single character c.

void EString::append( const EString & other )

Appends other to this string.

void EString::append( const char * s )

This version of append() appends the null-terminated string s, or does nothing if s is null.

void EString::append( const char * base, uint num )

This version of append() appends num raw bytes from memory base. If base is null, this function does nothing.

void EString::appendNumber( int64 n, uint base )

Converts n to a number in the base system and appends the result to this string. If n is 0, "0" is appended.

Uses lower-case for digits above 9.

bool EString::boring( Boring b ) const

Returns true if this string is really boring, and false if it's empty or contains at least one character that may warrant quoting in some context. So far RFC 822 atoms, 2822 atoms, IMAP atoms and MIME tokens are considered.

This function considers the intersection of those character classes to be the Totally boring subset. If b is not its default value, it may include other characters.

uint EString::capacity() const

Returns the capacity of the string variable, that is, how long the string can be before it has to allocate memory.

int EString::compare( const EString & other ) const

Returns -1 if this string is lexicographically before other, 0 if they are the same, and 1 if this string is lexicographically after other.

The comparison is case sensitive - just a byte comparison.

bool EString::contains( const EString & s ) const

Returns true if this string contains at least one instance of s.

bool EString::contains( const char c ) const

Returns true if this string contains at least one instance of c.

bool EString::containsWord( const EString & s ) const

Returns true if this string contains at least one instance of s, and the characters before and after the occurence aren't letters.

EString EString::crlf() const

Returns a copy of this string where every linefeed is CRLF, and where the last two characters are CRLF.

const char * EString::cstr()

Returns the zero-terminated byte representation of the string. Note that even though the return value is zero-terminated, it can also contain null bytes in the middle.

Even though this function modifies memory, it doesn't detach(), since it doesn't modify the string. However, in most cases its call to reserve() causes a detach().

const char * EString::cstr() const

This const version of cstr() is the same as the non-const version above. The only difference is that it can be called on a const object, and that it may cause some memory allocation elsewhere.

const char * EString::data() const

Returns a pointer to the string's byte representation, which is NOT necessarily zero-terminated.

EString EString::de64() const

Decodes this string using the base-64 algorithm and returns the result.

EString EString::deQP( bool underscore ) const

Decodes this string according to the quoted-printable algorithm, and returns the result. Errors are overlooked, to cope with all the mail-munging brokenware in the great big world.

If underscore is true, underscores in the input are translated into spaces (as specified in RFC 2047).

EString EString::deURI() const

Returns a version of this EString with every %xx escape replaced with the corresponding character (as used to encode URIs). Invalid escape sequences are left unchanged, so this function cannot be used for input from potentially malevolent sources.

EString EString::deUue() const

An implementation of uudecode, sufficient to handle some occurences of "content-transfer-encoding: x-uuencode" seen. Possibly not correct according to POSIX 1003.2b, who knows.

EString EString::decoded( Encoding e ) const

Returns a e decoded version of this EString.

void EString::detach()

Ensures that the string is modifiable. All EString functions call this prior to modifying the string.

EString EString::e64( uint lineLength ) const

Encodes this string using the base-64 algorithm and returns the result in lines of at most lineLength characters. If lineLength is not supplied, e64() returns a single line devoid of whitespace.

EString EString::eQP( bool underscore, bool from ) const

Encodes this string using the quoted-printable algorithm and returns the encoded version. In the encoded version, all line feeds are CRLF, and soft line feeds are positioned so that the q-p looks as good as it can.

Note that this function is slightly incompatible with RFC 2646: It encodes trailing spaces, as suggested in RFC 2045, but RFC 2646 suggest that if trailing spaces are the only reason to q-p, then the message should not be encoded.

If underscore is present and true, this function uses the variant of q-p specified by RFC 2047, where a space is encoded as an underscore and a few more characters need to be encoded.

If from is present and true, this function also makes sure that no output line starts with "From " or looks like a MIME boundary.

EString EString::eURI() const

Returns a version of this EString with absolutely nothing changed. (This function is eventually intended to percent-escape URIs, the opposite of deURI().)

EString EString::encoded( Encoding e, uint n ) const

Returns an e encoded version of this EString. If e is Base64, then n specifies the maximum line length. The default is 0, i.e. no limit.

This function does not support Uuencode. If e is Uuencode, it returns the input string.

bool EString::endsWith( const EString & suffix ) const

Returns true if this string ends with suffix, and false if it does not.

bool EString::endsWith( const char * suffix ) const

Returns true if this string ends with suffix, and false if it does not.

int EString::find( char c, int i ) const

Returns the position of the first occurence of c on or after i in this string, or -1 if there is none.

int EString::find( const EString & s, int i ) const

Returns the position of the first occurence of s on or after i in this string, or -1 if there is none.

EString EString::fromNumber( int64 n, uint base )

Returns a string representing the number n in the base system, which is 10 (decimal) by default and must be in the range 2-36.

For 0, "0" is returned.

For bases 11-36, lower-case letters are used for the digits beyond 9.

EString EString::headerCased() const

Returns a copy of this string where all letters have been changed to conform to typical mail header practice: Letters following digits and other letters are lower-cased. Other letters are upper-cased (notably including the very first character).

EString EString::hex() const

Returns the lowercase-hexadecimal representation of the string.

EString EString::humanNumber( int64 n )

Returns n as a string representing that number in a human-readable fashion optionally suffixed by K, M, G or T.

The number is rounded more or less correctly.

bool EString::isQuoted( char c, char q ) const

Returns true is the string is quoted with c (default '"') as quote character and q (default '\') as escape character. c and q may be the same.

uint EString::length() const

Returns the length of the string. The length does not include any terminator or padding.

EString EString::lower() const

Returns a copy of this string where all upper-case letters (A-Z - this is ASCII only) have been changed to lower case.

EString EString::mid( uint start, uint num ) const

Returns a string containing the data starting at position start of this string, extending for num bytes. num may be left out, in which case the rest of the string is returned.

If start is too large, an empty string is returned.

bool EString::needsQP() const

This function returns true if the string would need to be encoded using quoted-printable. It is a greatly simplified copy of eQP(), with the changes made necessary by RFC 2646.

uint EString::number( bool * ok, uint base ) const

Returns the number encoded by this string, and sets *ok to true if that number is valid, or to false if the number is invalid. By default the number is encoded in base 10, if base is specified that base is used. base must be at least 2 and at most 36.

If the number is invalid (e.g. negative), number() returns 0.

If ok is a null pointer, it is not modified.

EString & EString::operator=( const EString & other )

Copies other to this string and returns a reference to this string.

EString & EString::operator=( const char * s )

Copies s to this string and returns a reference to this string. If s is a null pointer, the result is an empty string.

void EString::operatordelete( void * p )

Deletes p. (This function exists only so that gcc -O3 doesn't decide that EString objects don't need destruction.)

void EString::prepend( const EString & other )

Prepends other to this string.

void EString::print() const

This function is a debugging aid. It prints the contents of the string within single quotes followed by a trailing newline to stderr.

EString EString::quoted( char c, char q ) const

Returns a version of this string quited with c, and where any occurences of c or q are escaped with q.

void EString::replace( const EString & a, const EString & b )

Replaces all occurences of a in this string with b. Rather slow and allocates much memory. Could be optimised if it ever shows up on the performance graphs.

a must not be empty.

Replaced sections are not considered when looking for the next match.

void EString::reserve( uint num )

Ensures that there is at least num bytes available in this string. This implicitly causes the string to become modifiable and have a nonzero number of available bytes.

After calling reserve(), capacity() is at least as large as num, while length() has not changed.

void EString::reserve2( uint num )

Equivalent to reserve(). reserve( num ) calls this function to do the heavy lifting. This function is not inline, while reserve() is, and calls to this function should be interesting wrt. memory allocation statistics.

Noone except reserve() should call reserve2().

EString EString::section( const EString & s, uint n ) const

Returns section n of this string, where a section is defined as a run of sequences separated by s. If s is the empty string or n is 0, section() returns this entire string. If this string contains fewer instances of s than n (ie. section n is after the end of the string), section returns an empty string.

void EString::setLength( uint l )

Ensures that the string's length is l. If l is 0, the string will be empty after the function is called. If l is longer than the string used to be, the new part is uninitialised.

EString EString::simplified() const

Returns a copy of this string where each run of whitespace is compressed to a single ASCII 32, and where leading and trailing whitespace is removed altogether.

bool EString::startsWith( const EString & prefix ) const

Returns true if this string starts with prefix, and false if it does not.

bool EString::startsWith( const char * prefix ) const

Returns true if this string starts with prefix, and false if it does not.

EString EString::stripCRLF() const

Returns a copy of this EString with at most one trailing LF or CRLF removed. If there's more than one LF or CRLF, the remainder are left.

EString EString::trimmed() const

Returns a copy of this string where leading and trailing whitespace have been removed.

void EString::truncate( uint l )

Ensures that the string's length is either l or length(), whichever is smaller. If l is 0 (the default), the string will be empty after the function is called.

EString EString::unquoted( char c, char q ) const

Returns the unquoted representation of the string if it isQuoted() and the string itself else.

c at the start and end are removed; any occurence of c within the string is left alone; an occurence of q followed by c is converted into just c.

EString EString::upper() const

Returns a copy of this string where all lower-case letters (a-z - this is ASCII only) have been changed to upper case.

EString EString::wrapped( uint linelength, const EString & firstPrefix, const EString & otherPrefix, bool spaceAtEOL ) const

Returns a copy of this string wrapped so that each line contains at most linelength characters. The first line is prefixed by firstPrefix, subsequent lines by otherPrefix. If spaceAtEOL is true, all lines except the last end with a space.

The prefixes are counted towards line length, but the optional trailing space is not.

Only space (ASCII 32) is a line-break opportunity. If there are multiple spaces where a line is broken, all the spaces are replaced by a single CRLF. Linefeeds added use CRLF.

EString::~EString()

Destroys the string.

Because EString is used so much, and can eat up such vast amounts of memory so quickly, this destructor does something: If the string is the sole owner of its data, it frees them.

As of April 2005, the return values of data() or cstr() are NO LONGER valid after a string has gone out of scope or otherwise been lost.

This web page based on source code belonging to The Archiveopteryx Developers. All rights reserved.