amazon web services: aws datapipeline - Failing for Large Files (40 GB in total)

vendredi 9 octobre 2015

aws datapipeline - Failing for Large Files (40 GB in total)

I am trying to run datapipeline which call/invoke EMR cluster and runs PIG scripts to do file processing. I tested pipeline / pig program works very well for smaller data sets but for larger datasets (40 GB) it is failing. And not even reading the files. I think it has something to do with node type. (Memory / space).

I tried using M3.2xlarge then we tried using r3.2xlarge (10 nodes), r3.4xlarge (5 nodeS), and i2.4xlarge (4 nodes) but none worked out.

Below is error for reference

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.4.0-amzn-4 0.12.0 hadoop 2015-10-09 15:08:03 2015-10-09 15:13:13 GROUP_BY,FILTER,UNION

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_1444402875772_0001 final_output,job_count_grouped,job_count_lag,job_counts,job_counts_no_differential,last_counts,last_counts_gen,ordered_job_count GROUP_BY Message: Job failed! path,

Input(s): Failed to read data from "path" Failed to read data from "s3://path"

Output(s): Failed to produce result in "s3://path"

any inputs will be helpful.

amazon web services

vendredi 9 octobre 2015

aws datapipeline - Failing for Large Files (40 GB in total)

Aucun commentaire:

Enregistrer un commentaire