Quantcast
Channel: Intel Communities : Discussion List - Servers
Viewing all articles
Browse latest Browse all 3923

P4600 and Linux kernel 4.13 timeout

$
0
0

Hi

 

I have installed two P4600 NVME devices in a server and installed Proxmox 5.1-3.  The running kernel is 4.13.13-6-pve.  There is no RAID controller involved.

 

# nvme list

Node             SN                   Model                                    Namespace Usage                      Format           FW Rev

---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------

/dev/nvme0n1     BTLE736103CH4P0KGN   INTEL SSDPEDKE040T7                      1           4.00  TB /   4.00  TB    512   B +  0 B   QDV10170

/dev/nvme1n1     BTLE736103AG4P0KGN   INTEL SSDPEDKE040T7                      1           4.00  TB /   4.00  TB    512   B +  0 B   QDV10170

 

The output of "isdct show -a -intelssd" is attached in the file "intelssdp4600-2.txt".

 

Now by using LVM I can always reproduce a "hanging" behaviour of 1-2 minutes that does not lead to any fatal error:

 

vgcreate SSD /dev/nvme0n1 /dev/nvme1n1

lvcreate -l 100%FREE -n SSDVMSTORE01 --stripes 2 --stripesize 128 --type striped SSD

lvremove -d -v SSD/SSDVMSTORE01

Do you really want to remove and DISCARD active logical volume SSDTEST/SSDTEST01? [y/n]: y

 

here the command seems to be hanging for 1-2 minutes.  in the Logs I see:

 

Feb 21 14:14:17 px kernel: [ 3654.745355] nvme nvme0: I/O 200 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.745772] nvme nvme0: I/O 201 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.746110] nvme nvme0: I/O 202 QID 14 timeout, aborting

Feb 21 14:14:17 px kernel: [ 3654.746436] nvme nvme0: I/O 203 QID 14 timeout, aborting

Feb 21 14:14:32 px kernel: [ 3669.013614] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014012] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014325] nvme nvme0: Abort status: 0x0

Feb 21 14:14:32 px kernel: [ 3669.014629] nvme nvme0: Abort status: 0x0

Feb 21 14:15:10 px kernel: [ 3707.737495] nvme nvme1: I/O 297 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.737902] nvme nvme1: I/O 298 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.738231] nvme nvme1: I/O 299 QID 14 timeout, aborting

Feb 21 14:15:10 px kernel: [ 3707.738547] nvme nvme1: I/O 300 QID 14 timeout, aborting

Feb 21 14:15:25 px kernel: [ 3722.005726] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006113] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006434] nvme nvme1: Abort status: 0x0

Feb 21 14:15:25 px kernel: [ 3722.006751] nvme nvme1: Abort status: 0x0

 

After this, the command completes without error.

 

This does not happen with Debian 9.3 (Kernel 4.9.x).  If I partition the devices with one 2GB primary partition and do the same operation but with /dev/nvmeXn1p1 or p2  the timeout does happen on the second partition but not on the first:

parted -a optimal /dev/nvme0n1 mklabel gpt

parted -a optimal /dev/nvme0n1 mkpart primary 4 2047

parted -a optimal /dev/nvme0n1 mkpart primary 2048 100%

 

parted -a optimal /dev/nvme1n1 mklabel gpt

parted -a optimal /dev/nvme1n1 mkpart primary 4 2047

parted -a optimal /dev/nvme1n1 mkpart primary 2048 100%

 

 

Any clues?

Best,

Pierre


Viewing all articles
Browse latest Browse all 3923

Trending Articles