PDA

View Full Version : Does scp lock the filesystem that is being transferred to?



x0500hl
01-Dec-2011, 17:16
Hi all,

My environment, where the problem is occurring, consists of nine zLinux
servers running on a mainframe under a single copy of z/VM. The
database server is running SLES 10 SP4. The other eight servers are
running SLES 11 SP1. There are four servers running WebSphere
Application Server (standalone) and four running IBM HTTP Server. The
non-database servers are paired up where one HTTP server communicates
with one WAS server (i.e. Prod HTTP 1 is paired with Prod WAS 1). Prod
WAS 1 hosts a filesystem that is shared with the other non-database
servers via NFS. All of the non-database servers have full read/write
access to this filesystem. The http and WAS servers all serve up data
from this filesystem to the end users.

I've been experiencing a problem that appears to be caused when
transferring a large file, 3.1GB, via scp. An SQL is run on the
database server to create a text report which is saved on the database
server in a specified directory. Once an hour, a script is run on the
database server to copy all files in that directory over to a specific
directory on Prod WAS 1.

When the problem occurs, the three WAS servers (Prod WAS 2 - 4) stop
processing for several minutes and Prod WAS 1 processes maybe 10% of the
requests that it normally does. Yesterday, while this problem was
occuring, one of the http servers crashed (IBM HTTP Server not zLinux).


It appears that scp is locking the filesystem for the duration of the
copy. Is this true? If so, any idea why it would lock the entire
filesystem and not the file that is being created?

If it is true that the filesystem is being locked, I will modify the
process to have scp copy the file to a different filesystem and then
move the file where it needs to reside. I'd rather not jump thru this
hoop if I don't have to.

I have spent over an hour Googling for an answer to this, so I'd
appreciate any insight that you can provide.


Harley


--
x0500hl
------------------------------------------------------------------------
x0500hl's Profile: http://forums.novell.com/member.php?userid=41024
View this thread: http://forums.novell.com/showthread.php?t=448991

ab
01-Dec-2011, 17:30
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've never seen that, and in testing now I can modify files manually
while using scp to copy a huge file to the same filesystem, though
admittedly the large file copy is really using up I/O so my performance
is really slow for other operations just because the disks are being
thrashed. Is it possible that is what you are experiencing (so slow it
seems to be stopped)? While copying can you do something on the
filesystem that is really simple like the following:

date && touch $TMP/test0 && date && rm $TMP/test0 && date

Note: I am using the x86_64 builds, not zLinux, of SLES.

Good luck.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJO16u1AAoJEF+XTK08PnB5stgQAK28PvGsVE Wu9/lBn6eI/qql
gO6Eu3ZPAnott0bx1qSGMoy/pNeqDNZxL/eEdAqQAg2Qctzy9LqsTZNseFx0nKXl
f90+5h0lPvEixSw2xGRCZNX0XfUd/mcg3fad1RA36IyopgTFQ802aR0b86rDoj7T
apBkQEnL6guYmnBKeBYyOQbDqyaeOouNRdqoucv1iPJFoG6Buy WoMxhvwEgHdwJU
cll1krohRl4++dHSExNPsx9Srv0DYFG+TYu3/irrJ3Mo67dR+m821l2N0PI9YhLK
jt6Vk+p+pF5I3T3QUIR2tKY8/qbio7JByUWm8G4benX3tuoo98xsMvN1jkh58VZI
Mg84ctH9tYQWaTR+lgWOKTG4Fe+QJLXlmHWJTMuYmIVwGP4m7M xlVWG4X6U+1gTy
EPFmpMRRqIZd++fk0sKrmqm2DKgrpW+J3aQ5aKIxi/weYQwDs18Pj8hhDxxLmZHr
5sw3xI1DrF5QjSqzKxmFfQCb8WKRumTOgaReLBMtr2/DVEPPexhvoKqke1Z7S4Kt
GgC6ZAOI2WG3vei7xHEBi3uM33kZk3WlPA1akXNefb3TkTFcXA 09yDT2BR1uq9DN
tI7zHs++yPvNET01/qkzOpXrgKPiV9wmLBySw7pHlkFSlSSMmUcSqaIiOxtjDBRA
6uq6VSNZEvslXQuU9wHS
=k7+Q
-----END PGP SIGNATURE-----

amadera
01-Dec-2011, 19:16
It will not be locked bythe OS. You my remove the file on the target
side before the transfer is over. Unless the target filesystem was
mounted read only. You should be getting a form of write error message.

Look at:
/sbin/ifconfig YOUR_NIC | egrep "col|err|drop|over|fra"
RX packets:34527802 errors:0 dropped:0 overruns:0 frame:0
TX packets:34263556 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000

Check the network port and look for error.
make sure the network cable is securly plugged in.

Angel


--
amadera
------------------------------------------------------------------------
amadera's Profile: http://forums.novell.com/member.php?userid=120594
View this thread: http://forums.novell.com/showthread.php?t=448991

Jim Henderson
01-Dec-2011, 22:10
On Thu, 01 Dec 2011 16:16:02 +0000, x0500hl wrote:

> It appears that scp is locking the filesystem for the duration of the
> copy. Is this true? If so, any idea why it would lock the entire
> filesystem and not the file that is being created?

scp in and of itself doesn't do that. But heavy I/O on a system can
cause what you're experiencing, especially if the device being written to
is a slower device (say a USB 1.1 or 2.0 device).

The way to check and see if this is the case is to do:

cat /proc/loadavg

on the target system.

The values returned show overall system load, which includes processes
waiting on I/O (which is not reflected in CPU utilization because it's
not CPU activity).

If you install the sysstat package, you can use the iostat command (again
on the target system) to see what % of processing is waiting on I/O (the
value is the %iowait value).

If the device's buffers are full and flushing to disk, that can seriously
impact performance that's I/O-bound.

Jim


--
Jim Henderson, CNA6, CDE, CNI, LPIC-1, CLA10, CLP10
Novell Knowledge Partner

x0500hl
02-Dec-2011, 14:56
Thank you for the info. I will run a test in a little while and will
gather data with commands:
/sbin/ifconfig YOUR_NIC | egrep "col|err|drop|over|fra"
cat /proc/loadavg
iostat

I ran a few tests yesterday and found that the issue appears to be
related to NFS. NOte that I can tell when WAS is processing or not as I
use Introscope to monitor WebSphere activity.

Test1: On Prod_WAS_1 I copied a file of almost 1GB, from the directory
that files are copied to via scp from the database server, to the /tmp
directory (root filesystem, non-nfs). WebSphere on Prod_WAS_1 stopped
processing and http on Prod_HTTP_1 stopped logging messages into its
access.log file until I cancelled the copy or the copy ended. The same
issue encountered as when the file was copied from the Database server.


Test2: On Prod_WAS_4 I copied a file of almost 1GB, from the directory
that files are copied to via scp from the database server (hosted on
Prod_WAS_1 and served up via nfs), to the /tmp directory on Prod_WAS_4
(root filesystem, non-nfs). WebSphere on Prod_WAS_4 stopped processing
and http on Prod_HTTP_4 stopped logging messages into its access.log
file until I cancelled the copy or the copy ended. The same issue
encountered as when the file was copied from the Database server.

Test3: On Prod_WAS_4 I copied the 1GB file from the /tmp directory
(root filesystem, non-nfs) to the /opt directory (separate fielsystem,
non-nfs). No issues were encountered, WAS continued processing and http
continued logging messages into its access.log file.


--
x0500hl
------------------------------------------------------------------------
x0500hl's Profile: http://forums.novell.com/member.php?userid=41024
View this thread: http://forums.novell.com/showthread.php?t=448991

ab
02-Dec-2011, 15:42
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Merely having an NFS export defined for a directory should not affect
performance. Were you actually transferring via NFS (I assume not since
your other copy, via scp, would not have used anything NFS-related). I
think the current path is likely a red herring. You could easily verify
this by simply disabling NFS for a period and then retry the file copy
to see if the system still slows down. How long is the file copy
taking? I created a 1 GB file on my laptop (with a crappy moving-parts
hard drive) and it only took a few (okay, a little under twenty) seconds
to write the whole thing. If your server is taking longer than that I
think that points back to the I/O contention hypotheses already presented.

time dd if=/dev/zero of=./1gb count=1024000 bs=1024 && time sync

Run this and post the output. Test it in different
directories/mountpoints if you would like to compare. Be sure to delete
the ./1gb file created each time.

Good luck.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJO2OPqAAoJEF+XTK08PnB5i/kQAJMBHwSg3S22q6higCUDZ9ek
K1mqjIlRPIaTRuzNS9caFRhcki4OWmN1HBmbZe3WoVKaUUdBS1 qy/PqkMv2pzdQz
WAunBokcEm65bgKomv9YvJ1qRR0IItpzi8C9d83hEGiloKzx/joj/4NCXGFeA/f7
2S/Poi2JqvAT6iQCDUKWDKvTDOBTd4nZCAsEXKwXyP1pl0oXNyBOy b0gCWCRUNDE
u2BXuw8sfVwCr305xyoCupAaIC9pLKjgePFimRz4GCwffk0YzF TxmYt2sU3Jd0N1
HdAtwalJfv2BJnD+nWNmenRT3hBZoYiO+r2HchTNk+ocu8dqdV Zod19QTce/W35R
YyXrid/hDdC5HEBWjnYdCL6crbaZHkcvVspjZHuyQNlhhZvRddpwFGejr t1Y1Py8
/Rz9wfrU68w+rWXjk9VwvMVHyeU5APPWNTMQXoD4hDlEz3PsZrf o++v7O0kYSqpz
9LtqkMZZZOUR0TMn9T3PgnTejvGLGFhJuFNNs6/Z/dL0ls0xek1SPZ+XPNWnDFcB
Ly4kus3n5Orc/PbikfZvHIF6IlP/rYG+KmJrzOxGQQk5Hlwk/gKj8qRXbUAbxRlT
UjKI1yQvZ80u+KRjfH5VyrMR4TUHeq8wkqoz6M9Sii2yjUcSh9 eP1tq9/yXrupmM
Y5IuQPOPcsYwbwB5boEW
=7XAB
-----END PGP SIGNATURE-----

Jim Henderson
03-Dec-2011, 02:28
On Fri, 02 Dec 2011 13:56:02 +0000, x0500hl wrote:

> Thank you for the info. I will run a test in a little while and will
> gather data with commands:
> /sbin/ifconfig YOUR_NIC | egrep "col|err|drop|over|fra"
> cat /proc/loadavg iostat
>
> I ran a few tests yesterday and found that the issue appears to be
> related to NFS. NOte that I can tell when WAS is processing or not as I
> use Introscope to monitor WebSphere activity.
>
> Test1: On Prod_WAS_1 I copied a file of almost 1GB, from the directory
> that files are copied to via scp from the database server, to the /tmp
> directory (root filesystem, non-nfs). WebSphere on Prod_WAS_1 stopped
> processing and http on Prod_HTTP_1 stopped logging messages into its
> access.log file until I cancelled the copy or the copy ended. The same
> issue encountered as when the file was copied from the Database server.
>
>
> Test2: On Prod_WAS_4 I copied a file of almost 1GB, from the directory
> that files are copied to via scp from the database server (hosted on
> Prod_WAS_1 and served up via nfs), to the /tmp directory on Prod_WAS_4
> (root filesystem, non-nfs). WebSphere on Prod_WAS_4 stopped processing
> and http on Prod_HTTP_4 stopped logging messages into its access.log
> file until I cancelled the copy or the copy ended. The same issue
> encountered as when the file was copied from the Database server.
>
> Test3: On Prod_WAS_4 I copied the 1GB file from the /tmp directory (root
> filesystem, non-nfs) to the /opt directory (separate fielsystem,
> non-nfs). No issues were encountered, WAS continued processing and http
> continued logging messages into its access.log file.

I would concur with ab that unless the target directory was on an NFS
share (ie, you're using scp to copy from host1 to a directory on host2
that's mounted from host3 via NFS) that the NFS bit is a red herring.

With the size of files you're handling, it's very likely that the io-
bound nature of the process is what's bogging things down. Writes are
typically not cached to preserve data integrity, and once the buffers are
full, things can slow down (that's what %iowait essentially is as I
understand it - it's a CPU idle% due to waiting on I/O).

What type of block device is the target file being written to?

Jim

--
Jim Henderson, CNA6, CDE, CNI, LPIC-1, CLA10, CLP10
Novell Knowledge Partner