A note on strcpy

Code, algorithms, languages, construction...
User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

A note on strcpy

Post by User923005 » Tue Nov 26, 2013 1:32 am

Source and destination cannot overlap. Though this has been the case since the beginning, apparently it is not as widely known as it ought to be. This is from the very first C standard (C89) {bolding is mine}:

4.11.2.3 The strcpy function

Synopsis

#include <string.h>
char *strcpy(char *s1, const char *s2);

Description

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1 . If
copying takes place between objects that overlap, the behavior is
undefined.

Returns

The strcpy function returns the value of s1 .

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 1:40 am

P.S.
The same thing is true for memcpy():

4.11.2.1 The memcpy function

Synopsis

#include <string.h>
void *memcpy(void *s1, const void *s2, size_t n);

Description

The memcpy function copies n characters from the object pointed to
by s2 into the object pointed to by s1 . If copying takes place
between objects that overlap, the behavior is undefined.

Returns

The memcpy function returns the value of s1 .

And so the C library has an overlap safe version called memmove():

4.11.2.2 The memmove function

Synopsis

#include <string.h>
void *memmove(void *s1, const void *s2, size_t n);

Description

The memmove function copies n characters from the object pointed to
by s2 into the object pointed to by s1 . Copying takes place as if
the n characters from the object pointed to by s2 are first copied
into a temporary array of n characters that does not overlap the
objects pointed to by s1 and s2 , and then the n characters from the
temporary array are copied into the object pointed to by s1 .

Returns

The memmove function returns the value of s1 .

I had a debate with a fellow programmer who said that memcpy() always copies correctly even when the memory regions overlap. I said that there is no guarantee of correct operation. He said, show me one single implementation that does not handle overlap safely, and I could not think of one. Turns out that some beta test sites got core dumps at that address. At least it was an easy fix. No perceptible difference in speed, either.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Tue Nov 26, 2013 5:34 am

If you read the old glibc source code, you will note that there is only ONE overlap that is a problem. Where the second argument points to the same string as the first argument, but to an earlier point (strcpy(string+4, string);). That's a problem. If the second argument points to the same string, but farther down by any amount, this "undefined" behavior isn't a problem. Until Apple made it a problem with Mavericks.

It broke my PGN parsing code that has been around forever. Code that has worked on EVERY operating system and compiler known to man. Until Mavericks. And even worse, when you do the reasonable overlap approach on Mavericks, it displays "Abort" and execution stops. No explanation. No nothing. Until you learn about the rather bizarre logging mechanism used to write explanations in ~/Library/... which the typical user won't have a clue about, nor should they.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 7:22 am

That is not the only system where it does not work.
At any rate, since the standard clearly specifies it is undefined behavior, relying on it doing something that you consider sensible is not different than using an uninitialized pointer, or performing a memcpy beyond the bounds of a data object.
According to the C language standard, it would be permissible for using an overlapped strcpy to cause Scott Nudds to come flying out of your left nostril.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Tue Nov 26, 2013 5:50 pm

Please cite a specific example. I will try to list the systems this has worked on.

1. IBM /390 and beyond mainframes.
2. IBM RS workstations using AIX or Linux, IBM compiler and gcc
3. Intel x86 and x86-64 using gcc, intel icc, msvc, windows, all flavors of linux including lightweight kernels, windows through current version.
4. Dec alpha using their compiler, also gcc, running under VMS, Unix, linux, BSD unix, etc.
5. MIPS using both gcc and SGI's compiler, IRIX os.
6. Cray running unicos or cos, using Cray's C compiler or gcc. (original cray-1 architecture). Also crays using alpha processors.
7. Next
8. Apple OSX thru mountain lion.
9. HP's 64 bit processor, operating system and compiler
10. Sun SPARC running solaris or linux, Sun'c C compiler or gcc.
11. Next

Need I go on?

Not one single failure on any machine built or compiler written until Mavericks decided to abort when the source/destination overlaps. The copy didn't break even then, because it simply must work as it was used. The development guys decided to insert a hack to make it not work, for reasons unknown. They are getting a lot of heat because it has broken a LOT of programs... without any valid reason for doing so.

There is a HUGE difference between the overlapped strcpy() I did and using an unitialized pointer. And one can STILL cause strcpy to crash with non-overlapping pointers, so what is the point for catching one thing that has been harmless and effective for years, while STILL having a serious security issue due to the source being longer than the destination, which strcpy() has absolutely no clue about. It was a stupid decision. I suspect it will be undone pretty soon. As it should be.

Feel free to quote a system where it doesn't work. So far as I know, Crafty has run on every architecture and compiler on planet earth. Including less well-known boxes like the Hitachi, cray-2 and cray-3, An old Cyber 176, Fujitsu, etc... There was no reason to break this.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 8:46 pm

You need not go on. Apparently, you do not understand what undefined behavior means. One thing it can mean is that broken code can appear to work.
We have fixed problems in our code base caused by overlapping strcpy and memcpy which had nothing do do with Apple operating systems.

For the links below, add http: or https: to the front if http: does not work. Idiot forum software thinks I am trying to spam.

But, so that you can easily find things for yourself and verify them, consider:
//bugs.launchpad.net/ubuntu/+source/eglibc/+bug/810739
//sourceware.org/bugzilla/show_bug.cgi?id=14011
//lists.ubuntu.com/archives/foundations-bugs/2011-July/011209.html
//meta-coding.blogspot.com/2011/07/overlapped-strcpy.html
//groups.google.com/forum/#!topic/comp.lang.c/MlGOWzH-U0k
//bugzilla.redhat.com/show_bug.cgi?id=208181
//tbamud.com/forum/4-development/3685-bug-found-in-get-number-strcpy-overlapping-strings
//gmt.soest.hawaii.edu/issues/431
//lists.gnu.org/archive/html/bug-make/2009-07/msg00018.html
//gcc.gnu.org/bugzilla/show_bug.cgi?id=12014
//lists.gnu.org/archive/html/bug-make/2009-08/msg00005.html
//icculus.org/pipermail/quake3-bugzilla/2008-March/000511.html
//us.generation-nt.com/answer/bug-615880-bash-uses-strcpy-overlapping-strings-help-204447211.html
//bugs.debian.org/cgi-bin/bugreport.cgi?bug=422826
//lists.fedoraproject.org/pipermail/scm-commits/2011-September/663145.html
//sourceforge.net/p/ctags/bugs/284/
//lists.samba.org/archive/samba-vms/2004-October/001326.html
//code.google.com/p/chromium/issues/detail?id=157615
//bugs.freedesktop.org/show_bug.cgi?id=56091
//cygwin.com/ml/gdb-patches/2012-04/msg00970.html
//list-archive.xemacs.org/pipermail/xemacs-beta/2012-October/023305.html
//lists.x.org/archives/xorg-devel/2013-February/035424.html
//ehc.ac/p/dspam/bug-tracker/163/
//www.archivum.info/comp.soft-sys.ace/201 ... avior.html
//groups.yahoo.com/neo/groups/vimdev/conversations/topics/48411
There are hundreds of others besides these. Easily found with a google search.

Here is a nice explanation of what can sensibly go wrong:
//forums.oracle.com/message/10102956

Here is a description of how Apple did you a tremendous favor:
//sourceware-org.1504.n7.nabble.com/Bug-libc-16004-New-memcpy-strcpy-detect-memory-overlap-and-crash-when-error-is-detected-td246316.html

Think about this for a moment:
You have a serious error in your code that can cause undetectable problems. Can you think of any better solution than abort() which will give you a core dump, so that gdb will take you directly to the problem line?

I would say that Apple's solution is as close to perfect as possible, except that they did not automatically correct the bug for you and write your repaired software to disk.

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Tue Nov 26, 2013 9:36 pm

So, the case is REALLY "there is only a problem if the source string overlaps the destination string, where the destination address is < source address. That can be verified by looking at ANY glibc strcpy() implementation. I don't do that. Explicitly. Because I know it won't work. Overlapping in the other direction is perfectly functional on ALL architectures. ALL compilers.

Now, Apple, for unknown reasons, has decided to go Draconian and make all cases of this fail, even though one works perfectly. And they have produced a BUNCH of complaints. The first I saw "strcpy() is much slower... why?" Simple answer: how can you determine if addresses overlap if you don't know how many bytes are to be copied? Then "how can you determine how many bytes are to be copied unless you add an extra loop to search for the terminating NULL or going ahead and doing the strcpy() and then looking at the final addresses compared to the starting addresses to see if there was overlap." You have ALREADY done the copy, it worked just fine, but you simply say "Abort" and stop.

If ANYONE looks at this with a rational point of view, they say "poor choice." Slow the code down. Detect and abort quietly on an error even if it is perfectly safe. And what is the net gain? 99% of the strcpy() errors are due to incorrect destination length, or incorrect source formatting (missing NULL). Why not fix those? Or try to? Only allow strcpy() to copy up to 1024 bytes and break another ten thousand programs?

You are stuck on "undefined". It is NOT undefined under the conditions I gave, you can look at the glibc code:

char *
26 strcpy (dest, src)
27 char *dest;
28 const char *src;
29 {
30 char c;
31 char *s = (char *) src;
32 const ptrdiff_t off = dest - s - 1;
33
34 do
35 {
36 c = *s++;
37 s[off] = c;
38 }
39 while (c != '\0');
40
41 return dest;
42 }

That will NEVER fail under the conditions I specifically gave, the specific conditions I use in my PGN parsing code. Namely that the source address is ALWAYS greater than the destination address.

Get off the "undefined" nonsense. It is not undefined given the correct conditions. Do ANYTHING poorly and any function can break. Just because the operands don't overlap does NOT guarantee the strcpy() will work.

Your "hypothetical question" makes no sense. Apple didn't do me any favor at all. It is IMPOSSIBLE for strcpy() to fail as I have used it. Absolutely impossible. UNLESS they choose to start copying right-to-left which would be even MORE inefficient than the tests they have already inserted. All the examples you gave me were of people using it stupidly. One CAN use it wisely with zero problems. If one uses it stupidly, NOTHING will prevent it from breaking things. Absolutely nothing...

BTW Abort does NOTHING on mac osx. I ran this under lldb. Nothing. Just "Abort" and "normal termination".

I would say Apple's solution is idiotic.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 11:10 pm

The manual itself says that the behavior is undefined. Every C book I have ever read said, "Don't do that" {of course, H. Schildt does not count}.
The earliest C standard (1989) says that the behavior is undefined and you should not do it.
Your lack of rationality on this issue is very puzzling. You are highly intelligent and capable of understanding complex problems. Why is it that something as simple as this puzzles you?
It's OK to be wrong, even very, very wrong.
But when it has been conclusively proven that you are wrong, it is best to say, "I am wrong."
The second step is to correct the error.
Otherwise, you end up looking like the main character in "The Emperor's New Clothes"

The manual says, "Don't do that."
You did it anyway and then you complained about it.
Then you said that the software was wrong when it was your error.
Then when it was patiently shown to you that it fails on 50 other systems and that it is easy to fix, you still insist that you are right.
Explanation was offered that showed that it is very reasonable that overlap is not allowed.
Explanation was offered that showed why the abort is a very good idea.
You still think that you were right to rely upon undefined behavior.
Doesn't that seem strange to you?

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 11:17 pm

This is from the C99 standard (to show that the behavior has not changed since C89):

7.21.2.3 The strcpy function
Synopsis
1 #include <string.h>
char *strcpy(char * restrict s1,
const char * restrict s2);
Description
2 The strcpy function copies the string pointed to by s2 (including the terminating null
character) into the array pointed to by s1. If copying takes place between objects that
overlap, the behavior is undefined.
Returns
3 The strcpy function returns the value of s1.

Here is the C Standard's definition of "Undefined Behavior":
3.4.3
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
2
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
3
EXAMPLE An example of undefined behavior is the behavior on integer overflow.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Tue Nov 26, 2013 11:19 pm

Be so kind as to quote the man page for your compiler's implementation of strcpy().
I am quite sure it will tell you not to use overlapped copy.

Post Reply