Interators are usually implemented using signed integers like the typical "for (int i=0; ..." and in fact is the type used indexing "cstr[i]", most of methods use the signed int, int by default is signed.
Nevertheless, the "std::string::operator[]" index is size_t which is unsigned, and so does size(), and same happens with vectors.
Besides the operator[] lack of negative index control, I will explain this later.
Do the compilers doesn't warn about this?
If his code got a large input it would index a negative numer, let see g++ and clang++ warnings:
No warnings so many bugs out there...
In order to reproduce the crash we can load a big string or vector from file, for example:
I've implemented a loading function, getting the file size with tellg() and malloc to allocate the buffer, then in this case used as a string.
Let see how the compiler write asm code based on this c++ code.
So the string constructor, getting size and adding -2 is clear. Then come the operator<< to concat the strings.
Then we see the operator[] when it will crash with the negative index.
In assembly is more clear, it will call operator[] to get the value, and there will hapen the magic dereference happens. The operator[] will end up returning an invalid address that will crash at [RAX]
In gdb the operator[] is a allq 0x555555555180 <_znst7__cxx1112basic_stringicst11char_traitsicesaiceeixem plt="">
(gdb) i r rsi
rsi 0xfffffffffffefffe -65538
The implmementation of operator ins in those functions below:
(gdb) bt
#0 0x00007ffff7feebf3 in strcmp () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff7fdc9a5 in check_match () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7fdce7b in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff7fdd739 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#4 0x00007ffff7fe1eb7 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#5 0x00007ffff7fe88ee in _dl_runtime_resolve_xsavec () from /lib64/ld-linux-x86-64.so.2
#6 0x00005555555554b3 in main (argc=2, argv=0x7fffffffe118) at main.cpp:29
29 cout << "penultimate byte is " << hex << s[i] << endl;
(gdb)
What about negative indexing in std::string::operator[] ?
It's exploitable!
In a C char array is known that having control of the index, we can address memory.
Let's see what happens with C++ strings:
The operator[] function call returns the address of string plus 10, and yes, we can do abitrary writes.
Note that gdb displays by default with at&t asm format wich the operands are in oposite order:
And having a string that is in the stack, controlling the index we can perform a write on the stack.
To make sure we are writing outside the string, I'm gonna do 3 writes:
The beginning of the std::string object is 0x7fffffffde50.
Write -10 writes before the string 0x7fffffffde46.
And write -100 segfaults because is writting in non paged address.
So, C++ std::string probably is not vulnerable to buffer overflow based in concatenation, but the std::string::operator[] lack of negative indexing control and this could create vulnerable and exploitable situations, some times caused by a signed used of the unsigned std::string.size()