A note on strcpy

Code, algorithms, languages, construction...
hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Sat Nov 30, 2013 6:33 pm

1. I have not suggested ANYONE use undefined behavior.

2. I have not defended my code as correct, or anything else, other than it worked for N years, and still does except for Apple Mavericks.

3. I have complained about Apple changing this for no good reason. If they wanted to take an action, why not just call memmove() rather than aborting? That FIXES the problem. Why just print "Abort" with no explanation. Leading to a lot of debugging by a lot of people (just do a google search).

4. If you had programmed for a long time, the reason for the suggested usage is well-documented... If you do strcpy(s+n, s) a right-to-left copy (the only reasonable way to copy a null-terminated string) can cause problems. The only requirements to produce a potential problem are that source and destination overlap, and the length of source + source pointer >= original destination pointer. A straightforward case. If you avoid that, strcpy works just fine with overlaps.

5. strcpy() exposes many potential pitfalls. Destination shorter than source. Source non-null-terminated. Source or destination null. In short, there are plenty of ways to misuse any programming construct to cause crashes, bad results, memory leaks, buffer overwrites, or any of a host of other known problems. If you look at the list of bugs attributed to strcpy() 95% have nothing to do with overlap whatsoever. Most are source/destination length mismatches. The rest are improperly formatted strings with no terminating null which causes a copy of an undetermined number of bytes. It will eventually find a zero somewhere to stop copying, but it will likely copy way too much.

6. Changing software to break it is unacceptable. As I mentioned, MOST programs that use overlap have programmers that know what they are doing. For me, this code dates back to the early Fortran days, where there were no strings. Yet I needed to handle some sort of string to be able to parse moves to create an opening book. I wrote my own "copy()" subroutine in Fortran to do exactly that. Copy left-to-right, but without any terminating null, I just told it how many bytes to copy. And it worked. When I started Crafty, I translated utility stuff from Fortran to C by hand, so that I could get past the trivia as quickly as possible and get Crafty up and running. I simply copied what I did, but changed to follow typical C usage and switched to normal strings. And strcpy() did exactly what my old copy did, so I used it.

As I said, I do not defend the practice as good. I generally avoid such myself. But this is old, and it worked, and the change Apple made was simply pointless. They could have fixed it by using memmove() and some would have applauded. Some would not because to use memmove() you have to locate the terminating 0 which means scan down the string byte by byte BEFORE starting the copy loop or calling memmove(), and that slows the code down significantly. But Apple didn't do that, they just located the 0, then used the source, destination and length to see if the operands overlap. If so they issue "Abort" with no other explanation and terminate execution.

They had three choices:

(a) leave it alone (the best choice)

(b) fix it switching to memmove() if it detects overlap. A good (but lower performance) choice.

(c) breaking it outright (a terrible choice).

If you don't agree, that's fine. But don't start with the "I find it pretty appalling" nonsense. One should never break code intentionally. There's no way to make an "idiot-proof" strcpy() in the first place, so make it as functional and as fast as possible (which is what has been done by EVERYONE except Apple Mavericks) and move on to something else.

I posed a similar scenario with an aircraft. Did you read that? A crab angle too large is dangerous and the behavior will be undefined because it is not advised. But if you have to land in a crosswind stronger than recommended, or else stick the plane into the ground, which would you choose. A good pilot can exceed max recommended crab angle with additional speed, and a quick foot on the rudder pedal to straighten the plane up just before the wheels touch the ground. Apple's approach would be "stick it in the ground, this is undefined, we don't care whether the pilot is good or not."

No software engineering book I can think of suggests such behavior. This reminds me of the 1970's where the old Xerox UTS operating system had one error message for the command shell: "eh?" Doesn't matter whether the file doesn't exist, you had a bad wildcard, malformed directory path, bad command option, whatever. eh? They improved. Apple regressed.


For Dann.

Here's the output for the intel compiler with my sample program. First, the EXACT code:

#include <stdio.h>
void main() {
int i;
i = func(0);
printf("%d\n", i);
}
int func(int init) {
int x;

if (init)
x=1;
return (x++);
}

And here's the compiler output with normal optimization enabled:

crafty% icc -O tst.c
tst.c(2): warning #1079: return type of function "main" must be "int"
void main() {
^



I do not know what, exactly, you are doing. Data flow does not work here. But try this change which might open your eyes a bit:

where you call the function with a zero or a one value, replace that with this:

double value;

value = drand48()

i = dodo((int) value);

Does it complain or not? Value will, in this case, always be zero.

replace with

i = dodo((int) (value + 0.5)

Does it complain or not? Value will be 0 50% of time, 1 50% of time.

Compilers can't determine that and give an accurate error. Compiler doesn't even have a clue what drand48() returns, other than a floating point number.

My version of msvc does NOT find this error either. We have the latest commercial compilers they release, with automatic updates. So no idea what you are seeing, but it does not match with what I see. I get the same (no message) output for gcc 4.7.3, clang, intel C compiler (more than one version) and MSVC (I can't give version until I get to the office on monday, my windows laptop stays there as all I use it for is grading assignments.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Sat Nov 30, 2013 9:50 pm

rand.jpg
I do not have drand() in Windows, but here is the result with rand(), which will return 0 one out of 32768 trials, since RAND_MAX is defined as 0x7fff on this platform.
The function rand() will hence return a number between 0 and 32767.

The diagnostic says, "using uninitialized memory" in this case. One could argue about whether this diagnostic is correct or incorrect because it is possible to run the program millions of times without the error, but it is also possible for it to happen the very first instance and ten times in a row. However, what is extremely clear is that I am given the line number of the defect and the defect is instantly obvious upon inspection.

As for Apple's change, I think it would have been better for them to cause abort() in debug mode and "issue a stern diagnostic" if possible, in release mode. However, we have to understand Apple's predicament also. They want to create more reliable software and runtime detection of a problem is both harder and more expensive (it would clearly require testing the addresses for every call, for example).

Because this sort of problem can create very subtle bugs, and because Apple uses the compiler to create their own software, I understand completely why they made the change.
In general, removal of undefined behavior from programs is extremely desirable, whenever it is possible.
I will admit, that in this case, they forced the removal with a ten gauge shotgun pointed into the programmer's face. But maybe that is the only thing that will get some of them to follow orders.

As for the Intel compiler, we used to use it where I work, but we stopped using it when we discovered a bug in their code generation and they did not fix it for a very long time. I found the Intel tools (in general) to be robust and excellent and so I would be surprised if the Intel compiler could not diagnose the problem.

If gcc cannot diagnose it, clang would be worth a try.
There is something wrong with my clang installation since it cannot find the standard headers, but it looks promising as far as diagnosing this type of problem:
dcorbit@dcorbit /q
$ clang -Wall -ansi -pedantic -O3 dodo.c main.c
dodo.c:3:9: warning: variable 'x' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if (init)
^~~~
dodo.c:5:5: note: uninitialized use occurs here
x++;
^
dodo.c:3:5: note: remove the 'if' if its condition is always true
if (init)
^~~~~~~~~
dodo.c:2:10: note: initialize the variable 'x' to silence this warning
int x;
^
= 0
1 warning generated.
main.c:1:10: fatal error: 'stdlib.h' file not found
#include <stdlib.h>
^
1 error generated.

dcorbit@dcorbit /q
$

Simon T
Posts: 12
Joined: Tue Sep 27, 2011 11:24 pm
Real Name: Simon Taylor
Location: Yorkshire, UK

Re: A note on strcpy

Post by Simon T » Sun Dec 01, 2013 2:29 am

hyatt wrote: void main() {
hyatt wrote: value = drand48()
hyatt wrote: i = dodo((int) (value + 0.5)
:lol:

lauriet
Posts: 10
Joined: Sun Nov 03, 2013 10:44 am
Real Name: laurie tunnicliffe

Re: A note on strcpy

Post by lauriet » Wed Mar 05, 2014 2:39 am

Hey Bob,
Did I read a few posts ago that Pascal programmers (and Java) don't have to worry about pointers ?????
Of course you must know Pascal uses pointers, and that type casting can be done.
e.g

TYPE
CastingType = RECORD
CASE BOOLEAN OF
False : (Int : Integer);
True : (Ptr : ^Integer);
END;

VAR
IntPtr : CastingType;

Now you can treat the variable as a integer or a pointer.

BEGIN
IntPtr.Int := 5000;
IntPtr.Ptr^ := 100;

Inc(IntPtr.Int, SizeOf(Integer));
IntPtr.Ptr^ := 200;
END;

Obviously this puts 100 in memory location 5000 and depending on the size of Integer puts 200 in memory location 5004. (for 4 byte ints)
A good compiler would produce tight code for this.

The advantage of this is that 'IT IS' a bit verbose and "stuffy". It tells the next person that is working on the code
to pay attention and don't take this part of the code for granted.

The discussion you have been having boils down to.... you are BOTH correct. Its horses for courses. If you are an expert and like to
control the language/machine/compiler yourself then thats what you want. If you have newbies that are still wet behind the ear its better
from an employers point of view not to let them do too much damage with undocumented features/ideocycrocies.

Of course, as I have suggested on this or other forums......you don't get this type of discussion/problem when you use Pascal.
But if people like the freedom of C then so be it.
Hey, wait a minute.....why not use assembler :o :D

hyatt
Posts: 1242
Joined: Thu Jun 10, 2010 2:13 am
Real Name: Bob Hyatt (Robert M. Hyatt)
Location: University of Alabama at Birmingham
Contact:

Re: A note on strcpy

Post by hyatt » Thu Mar 06, 2014 5:04 am

For the record, look at the original Pascal standard. This was not possible. They eventually HAD to add the mechanism that let you access the same memory address with two different types. But they didn't like it. And they have apparently kept working on those weaknesses. I will always believe C did it right from the beginning. Keep it simple, keep it flexible, let the programmer do what he wants with as little fanfare and bizarre syntax as possible.

Pascal is WAY too strongly typed for me, in any case.

lauriet
Posts: 10
Joined: Sun Nov 03, 2013 10:44 am
Real Name: laurie tunnicliffe

Re: A note on strcpy

Post by lauriet » Thu Mar 06, 2014 10:47 am

Hey Bob

:?: :?: :?:
Didn't Wirths original Pascal compiler (written in Pascal) have pointers and variant records :?:
My copy of it does.
My ISO pascal standard, User Manual and Report (Jensen and Wirth)
defines them.

:?: :?: :?:


Regards
Laurie

hiyamilt
Posts: 1
Joined: Fri Jul 18, 2014 5:01 pm
Real Name: milt johnson

Re: A note on strcpy

Post by hiyamilt » Fri Jul 18, 2014 5:10 pm

HIYA

I haven't read an strcpy man page in quite a while (the mavericks install removed the man pages!)

I have code which depends on strcpy to change /User/me/whereIAmNow/../temp to
/User/me/temp with a simple strcpy. Wrote it over 10 years ago. Runs in bsd, used to run in osx, even runs on Gates Garbage.

I think one line in a C++ header file restores the natural strcpy behavior:
inline void strcpy (char *put, char *get) {do *put++ = *get; while (*get++);}

With todays machines the loss of speed isn't a problem. Of course, a similar fix is possible in c.

P.S. my opinion: apple made a mistake.

User923005
Posts: 616
Joined: Thu May 19, 2011 1:35 am

Re: A note on strcpy

Post by User923005 » Fri Jul 18, 2014 6:42 pm

1. The strcpy() function does not return void.
2. Functions starting with str* do not live in the user's namespace, they live in the implementation's namespace.

This is from the ANSI/ISO C99 standard:

7.21.2.3 The strcpy function

Synopsis
1 #include <string.h>
char *strcpy(char * restrict s1,
const char * restrict s2);
Description

2 The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

Returns
3 The strcpy function returns the value of s1.

Who made a mistake? {Bolding is mine}

Post Reply