I've got this table:
Date GroupID Channel Daysbeforelast 24/02/2015 1 A 3 26/02/2015 1 B 1 27/02/2015 1 C 0 21/03/2015 2 D 0 20/02/2015 3 E 5 25/02/2015 3 D 0 28/02/2015 4 C 0 04/03/2015 5 B 19 05/03/2015 5 E 18 10/03/2015 5 D 13 11/03/2015 5 A 12 14/03/2015 5 C 9 23/03/2015 5 F 0 28/03/2015 6 E 0
and trying to add one more column Weight so the table will look like this:
GroupID Channel Daysbeforelast Weight How it is calculated 1 A 35 0.00005 (1-x/2-x)/2 1 B 31 0.00005 (1-x/2-x)/2 1 C 29 0.0833 (x/2)/3 1 D 17 0.0833 (x/2)/3 1 E 15 0.0833 (x/2)/3 1 D 5 0.25 x/2 1 C 0 0.5 x 2 B 66 0.125 (1-x/2/2-x/2-x) 2 E 17 0.0625 (x/2/2)/2 2 D 15 0.0625 (x/2/2)/2 2 A 2 0.25 (x/2) 2 C 0 0.25 (x)/2 2 F 0 0.25 (x)/2 3 A 0 0.5 x 4 B 15 0.5 1-x 4 C 0 0.5 x
Each group can have one or more subgroups depending on the data:
When Daysbeforelast=0 then Subgroup1;
When 0<Daysbeforelast<=7 then Subgroup2;
When 7<Daysbeforelast<=14 then Subgroup3;
When 14<Daysbeforelast<=30 then Subgroup4;
Else Subgroup5.
First subgroup has weight = x (x=0.5 for example). This weight is distributed evenly across all row in subgroup in the group. Lets say group 1 has 3 rows in Subgroup1. In this case each row will have weight equal x/3.
Subgroup2 has weight = x/2 and it is distributed across all rows in that subgroup the same way.
Each subgroup until the last one will receive weight equal previous subgroup divided by 2. Last subgroup in the group (it is not always subgroup 5) will receive weight = 1 - sum of all previous weights in the group.
Note that if we don't have any rows in Subgroup2 then Subgroup3 will receive weight x/2 (not x/2/2).
This is to check that sum of Weights for each group is equal to 1.
The easiest way to do this perhaps using variables but AWS where it is hosted doesn't support variables.
My approach was to - add column that would indicate sub-group: if=0 then group y, if 0< <=7 then group y+1, etc; Don't know how to do this. - count number of subgroups for each group; - write number of case when then according to the above.
Not sure this is the best approach. Thanks for looking into my question.
Aucun commentaire:
Enregistrer un commentaire