1.10. Collections

In addition to the numeric, Boolean, and character types, C++ also offers built-in collection types. A collection data type is a grouping of some number of other data items (possibly only zero or one) that have some shared significance or need to be operated upon together.

Arrays, vectors, strings, sets, and hash tables are among these useful C++ collection types.

1.10.1. Arrays

What is an Array? An array is a data structure consisting of an ordered collection of data elements, of identical type in which each element can be identified by an array index. More technically, an array data structure is an ordered arrangement of values located at equally spaced addresses in contiguous computer memory.

A C++ array is always stored in contiguous memory. C++ arrays can be allocated in two different ways:

  1. statically allocated in which the array size is fixed at compile-time and cannot change
  2. dynamically allocated in which pointers are used in the allocation process so the size can change at run-time

In modern C++, the statically allocated array is typically used in situations when speed is essential or where hardware constraints exist, and a data structure called a vector is typically used when more flexibility is required.

Because C++ stores variables directly, each element of a C++ array must be of identical data type. The indices for arrays start counting with 0.

The use of arrays permits us to utilize an ordered set of memory locations that we can then manipulate as a single entity, and that at the same time gives us direct access to each individual component.

Why use an Array?

C++ is a language often used for real-time or low-level processing where speed is essential and/or there is only a fixed amount of space available.

The fact that array elements are stored in memory in contiguous memory locations making look-up via index very, very fast. In computing, a word is the unit of data used by a particular processor design, such as 32 or 64 bits. For example, an array of 100 integer variables, with indices 0 through 99, might be stored as 100 words at memory addresses 20000, 20004, 20008, … 20396. The element with index i would be located at the address 20000 + 4 × i.

Statically allocated C++ arrays must be declared with both a type and a size at compile-time.

double darray[4];
int iarray[10];
char arr2[3000];

It is also possible to initialize statically allocated arrays at compile time, in which case the number of items determines the array’s size.

int arr[] = {1, 2, 3, 4};  // This is size 4.
char arr2[] = {'a', 'b', 'c'}; // This is size 3.
string arr3[] = {"this", "is", "an", "array"}; // This is size 4.

Note that an array with a single element is not the same type as the atomic type, so the following are not the same.

double darray[] = {1.2};  // must use index to access value
double ddouble = 1.2;     // can't use index to access value

Be Cautious with Arrays

The speed and low-level control that arrays offer us as programmers is powerful… and dangerous. C+ is designed for speed, and using a C++ array will help you better understand the trade-offs inherent in this.

Here are examples of iteration.

C++ is designed for speed. C++ will not only let you iterate beyond either end of an array, but it will let you change the values beyond either end of the array with sometimes catastrophic results.

The phrase, “be careful what you wish for” is a great one to remember when programming in C++. Because C++ will generally try to do everything you ask for.

The speed of C++ comes at the cost of minimal to no error checking. Sometimes this can have perplexing results such as in the next example.

You should use an array when you have a need for speed or you need to work with hardware constraints. Otherwise, you may want to consider using another collection data type, the vector.

    Q-1: In the above example, what happened to otherdata[ ] in C++?
  • Nothing. Everything is fine.
  • Actually, there is a problem. Look carefully.
  • All data was automatically reinitialized.
  • No. C++ just does what you tell it to do.
  • I have no idea. Please give me a hint.
  • Try again. One of these is indeed correct. Look at the memory addresses.
  • The first loop went out of bounds and wrote over the values in otherdata.
  • Right!
  • none of the above
  • One of the above is indeed correct.

1.10.2. Vectors

Vectors use a dynamically allocated array to store their elements, so they can change size, and they have other friendly features as well. Because they use a dynamically allocated array, they use contiguous storage locations which means that their elements can be accessed and traversed, and they can also be accessed randomly using indexes. However, vectors are dynamically sized, so their size can change automatically. A new element can be inserted into or deleted from any part of a vector, and automatic reallocation for other existing items in the vector will be applied. Vectors are homogeneous, so every element in the vector must be of the same type.

Vectors are a class that is available through a library called the Standard Template Library (STL), and one uses a < > notation to indicate the data type of the elements. In order to use vectors, One needs to include the vector library.

#include <vector>
Common C++ Vector Operators
Vector Operation Use Explanation
[ ] myvector[i] access value of element at index i
= myvector[i]=value assign value to element at index i
push_back myvect.push_back(item) Appends item to the far end of the vector
pop_back myvect.pop_back() Deletes last item (from far end) of the vector
insert myvect.insert(i, item) Inserts an item at index i
erase myvect.erase(i) Erases an element from index i
size myvect.size() Returns the actual size used by elements
capacity myvect.capacity() Returns the size of allocated storage capacity
reserve myvect.reserve(amount) Request a change in capacity to amount

A very common programming task is to grow a vector using the push_back() method to append to the vector as we see in the next example. Because vectors can change size, vectors typically allocate some extra storage to accommodate for possible growth. Thus the vector typically has an actual capacity greater than the storage size strictly needed to contain its elements.

In the above example, the use of reserve was optional. However, it is a good idea to use it before growing a vector in this way because it will save time. Because vectors are stored in underlying arrays which require contiguous memory, every time the vector’s size gets too large for the capacity, the entire vector must be moved to a larger location in memory, and all that copying takes time. In a typical implementation, the capacity is doubled each time. as in the example that follows.

Remembering that C++ is designed for speed, not protection, we will likely not be surprised by the following:

    Q-2: Which of the following is the biggest difference between a C++ array and a C++ vector?
  • Vectors can change size.
  • Right! Good job!
  • Vectors offer many more features and protections than arrays.
  • Not all of the protections of lists are offered by vectors; one can still iterate off of either end.
  • Vectors don't use contiguous memory, so elements can be inserted.
  • No. Although elements can be inserted in vectors, they do require contiguous memory.
  • more than one of the above
  • No. Only one of the above is correct.
  • none of the above
  • One of the above is indeed correct.
    Q-3: What good is the reserve method in a vector?
  • Nothing. It is completely optional.
  • It is optional but it does serve a purpose. Try again.
  • Using it will save time if you know the maximum size needed.
  • Right!
  • It is required so memory can be allocated.
  • It is not required.
  • none of the above
  • One of the above is indeed correct.

1.10.3. Strings

Strings are sequential collections of zero or more characters such as letters, numbers and other symbols. There are actually two types of strings in C++ . The C++ string or just string from the <string> library is the more modern type. The old style C-string which is essentially an array of char type. The char type itself is actually distinct from both types of strings.

char cppchar = 'a';   // char values use single quotes
string cppstring = "Hello World!";  // C++ strings use double quotes
char cstring[] = {"Hello World!"};    // C-string or char array uses double quotes

In older versions of C++, you must use a char array to work with filenames. In modern C++ (from C++11 onward), however, you can use a C++ string for everything. Since C++ strings are so much nicer, I would not recommend using C-strings unless you have a reason.

Because strings are sequences, all of the typical sequence operations work as you would expect. In addition, the string library offers a huge number of methods, some of the most useful of which are shown in Table 4.

Table 4: String Methods Provided in C++
Method Name Use Explanation
[ ] astring[i] access value of character at index i
= astring[i]=value change value of character at index i
+ string1 + astring2 concatenate strings
append astring.append(string) Append to string the end of the string
push_back astring.push_back(char) Appends a character to the end of the string
pop_back astring.pop_back() Deletes the last character from the end of the string
insert astring.insert(i, string) Inserts a string at a specific index
erase astring.erase(i, j) Erases an element from one index to another
find astring.find(item) Returns the index of the first occurrence of item
size astring.size() Returns the size of the string

Check your understanding by completing the following question.

    Q-4: Drag each data type to its' corresponding C++ initialization syntax. Feedback shows incorrect matches.
  • char
  • 'a'
  • char array
  • {'a'}
  • string
  • "a"

1.10.4. Hash Tables

A hash table is a collection of associated pairs of items where each pair consists of a key and a value. Hash tables are often called the more general term map because the associated hash function “maps” the key to the value. Nevertheless, it is better to use the more precise term, hash table because other kinds of maps are sometimes implemented with a different underlying data structure.

Each hash table has a hash function which given the key as input to the hash function returns the location of the associated value as the output. This makes look up fast.

In C++, the unordered_map implements the hash table, and the <unordered_map> library must be included as follows:

#include <unordered_map>

The syntax for hash table access is much like we might expect except that instead of using the index of the item for look-up, we use the key. Hash tables can be initialized with key-value pairs and key-value pairs can also be added later as we see in the following example. In C++, the keyword first is used for the key, and second is used for the associated value.

It is important to note that hash tables are organized by the location given by the hash function rather than being in any particular order with respect to the keys. This makes look-up extremely fast. Hence, although it is possible to iterate through a hash table, it is an odd thing to do because the data is not typically stored sequentially. Iterators of an unordered_map are implemented using pointers to point to elements of the value type as we see in the following example.

Hash Tables have both methods and operators. Table 7 describes them, and the session shows them in action.

Table 7: Important Hash Table Operators Provided in C++
Operator Use Explanation
[ ] mymap[k] Returns the value associated with k, otherwise throws error
count mymap.count(key) Returns true if key is in mymap, false otherwise
erase mymap.erase(key) Removes the entry from mymap
begin mymap.begin() An iterator to the first element in mymap
end mymap.end(key) An iterator pointing to past-the-end element of mymap

1.10.5. Unordered Sets

An unordered_set is an unordered collection of zero or more unique C++ data values of a particular type. To use unordered_sets, you import unordered_set from the Standard template library with #include <unorderd_set>.

Unordered_sets allow for fast retrieval of individual elements based on their value. In an unordered_set, the value of an element is at the same time its key, that identifies it uniquely. Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - However, they can be inserted and removed.

Unordered sets do not allow duplicates and are initialized using comma-delimited values enclosed in curly braces. The collection can be assigned to a variable as shown below.

set<int> mySet = {3, 6, 4, 78, 10}

Unordered sets support a number of methods that should be familiar to those who have worked with sets in a mathematics setting. Table 6 provides a summary. Examples of their use follow.

Table 6: Methods Provided by Sets in C++
Method Name Use Explanation
union set_union() Returns a new set with all elements from both sets
intersection set_intersection() Returns a new set with only those elements common to both sets
difference set_difference() Returns a new set with all items from first set not in second
add aset.insert(item) Adds item to the set
remove aset.erase(item) Removes item from the set
clear aset.clear() Removes all elements from the set
    Q-5: Which C++ structure is the best choice for a group of ordered data of a fixed length?
  • array
  • Correct!
  • hash table
  • No. hash tables are not ordered.
  • string
  • A string would only work for character data. Try again.
  • vector
  • There is a better choice given that the group is fixed length
  • more than one of the above
  • Only of the above is best.
    Q-6: Drag each data type to its' corresponding C++ initialization syntax. Feedback shows incorrect matches.
  • Array
  • {“What”, “am”, “I”, "am"}
  • Set
  • {“What”, “am”, “I”}
  • String
  • “What am I”
  • Hash Table
  • {{“What”, “am I”}}
Next Section - 1.11. Defining C++ Functions