I need to stream large text files from S3 to users via browser after performing some action on each line via Rails middleware. I already have a solution in which I spawn a python subprocess to read data from S3, transform it and pipe it to parent (Rails), which streams it to browsers.
But the system is kinda unpredictable, sometimes (~10-20% times) it streams full files (~1gb) and sometimes it just stream data between 1mb-10mb and completes.
#in controller
cmd = 'python script.py arguments'
pipe = IO.popen("-","w+")
if pipe
respond_to do |format|
format.all { streaming_render(pipe, filename) }
end
else
exec(cmd)
end
....
....
def streaming_render(pipe, filename)
set_streaming_headers(filename)
response.status = 200
#setting the body to an enumerator, rails will iterate this enumerator
self.response_body = pipe.enum_for
end
def set_streaming_headers(filename)
headers["Content-Type"] = (params[:format] == 'csv') ? "text/csv" : "text/plain"
headers["Content-disposition"] = "attachment; filename=\"#{filename}\""
headers['X-Accel-Buffering'] = 'no' #for nginx
headers["Cache-Control"] ||= "no-cache"
headers.delete("Content-Length")
end
The python script is well tested. It always loads data (~1gb) in 4-5 seconds.
In one of the AWS Blogs I found intrinsic support for streaming in aws php sdk. Does aws ruby sdk also have this support? If not how I can change my current streaming logic to make it full proof?
Aucun commentaire:
Enregistrer un commentaire