dimanche 30 août 2015

EMR: Running out of local space

I am trying to bring up an EMR cluster (using boto3), but I constantly run out local memory for tasks such as logging and reading data from S3, etc. This is a cluster of d2.xlarges, which have nearly 4 TB of ephemeral storage.

  1. When I do a df -hi, I see:

    Filesystem     Inodes IUsed IFree IUse% Mounted on
    /dev/xvda1       640K  146K  495K   23% /
    devtmpfs         3.8M   483  3.8M    1% /dev
    tmpfs            3.8M     1  3.8M    1% /dev/shm
    /dev/xvdb        373M   359  373M    1% /mnt
    /dev/xvdc        373M    13  373M    1% /mnt1
    /dev/xvdd        373M    13  373M    1% /mnt2
    
    
  2. When I do a lsblk, I see:

    NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    xvda    202:0    0   10G  0 disk
    └─xvda1 202:1    0   10G  0 part /
    xvdb    202:16   0  1.8T  0 disk /media/ephemeral0
    xvdc    202:32   0  1.8T  0 disk /mnt1
    xvdd    202:48   0  1.8T  0 disk /mnt2
    
    
  3. When I do a mount, I see:

    proc on /proc type proc (rw,relatime)
    sysfs on /sys type sysfs (rw,relatime)
    devtmpfs on /dev type devtmpfs (rw,relatime,size=15692656k,nr_inodes=3923164,mode=755)
    devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
    tmpfs on /dev/shm type tmpfs (rw,relatime)
    /dev/xvda1 on / type ext4 (rw,noatime,data=ordered)
    none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
    /dev/xvdb on /mnt type xfs (rw,relatime,attr2,inode64,noquota)
    /dev/xvdc on /mnt1 type xfs (rw,relatime,attr2,inode64,noquota)
    /dev/xvdd on /mnt2 type xfs (rw,relatime,attr2,inode64,noquota)
    /dev/xvdb on /media/ephemeral0 type xfs (rw,relatime,attr2,inode64,noquota)
    
    

This seems to indicate that only 10 GB is allocated to the local filesystem for logging, reading in data from S3 etc. How can I set this to be a higher percentage of the (ephemeral) memory? Should I mount one of the other available drives to the local filesystem and then use that as the "/"? Which one of the above should that be?


Potentially related, but their solution was to use a paramter specific to the MapR distribution, and it is not even clear to me how that would be used in a boto script.




Aucun commentaire:

Enregistrer un commentaire