# Re: Find and aggregate duplicates (Dennis Shea)

Date: Wed, 18 Mar 2009 12:12:28 +0100

Hi Dennis,

Thank you for your script, however it is too general (and slow) for me.
Using some trick (days are integer, and time interval is less than 20 years)
we can carry out this task much more faster, just use days as auxiliary
arrays index. I put below this algorithm using your uniqueDays function
interface
(http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20090317/7fdcb1f2/at
tachment.pl). Maybe it can be useful for somebody, very simple but quite
fast.

; Days are integer yyyymmdd
function uniqueDays(Days[*]:integer, Data[*]:numeric)
local iDay,DayA,DayB,nDay,i,n,ii,nums,sums,x
begin
DayA = min(Days)
DayB = max(Days)
nDay = DayB-DayA+1
nums = new(nDay,integer) ; that's the trick
nums = 0 ; 1995-2008 (14 years) corresponds to
nDay=14*10,000=140,000
sums = new(nDay,float) ; sure, a lot of nums and sums are dead (e.g.
20021399 is not a day)
sums = 0. ; however, these arrays length is not very
large
n = dimsizes(Days)
do i = 0,n-1 ; calculation time is proportional to
dimsizes(Days)!!!
iDay = Days(i)-DayA ; use Days(i) as index
nums(iDay) = nums(iDay)+1
sums(iDay) = sums(iDay)+Data(i)
end do
ii = ind(nums.gt.0)
x = new((/dimsizes(ii),3/),float)
x(:,0) = DayA+ii ; unique days
x(:,1) = nums(ii) ; number of duplicates
x(:,2) = sums(ii)/nums(ii) ; daily average
return x
end

Best regards,
Slava

-----Original Message-----
From: Dennis Shea [mailto:shea_at_ucar.edu]
Subject: Re: Find and aggregate duplicates

A sample function and test driver is attached.
Quite possibly, others could write more efficient code.

Wall Clock Timings:
8000 elements: less than 1 sec
40000 elements: 6 seconds
80000 elements: 25 sec

Good luck

An embedded and charset-unspecified text was scrubbed...
Name: aggdup.ncl
Url:
http://mailman.ucar.edu/pipermail/ncl-talk/attachments/20090317/7fdcb1f2/att
achment.pl

> Hello,
> We have two huge arrays with the same dimension:
> Days = (/19950308,19950314,19950314,...,20081228,20081231,20081231/)
> Data = (/ 12.1, 22.5, 32.0, 12.8, 16.0, 32.1/)
> There can be several data values for the same day.
> Days are irregular. Sure we can sort them, however there will be gaps
> and duplicates in this array.
> Is it possible to carry out the following tasks efficiently, no loops,
> using only built-in NCL functions:
> 1) Construct the list of unique days, no duplicates
> 2) Calculate amount of duplicates for each unique day
> 3) Calculate the data average for each unique day
> I see no such solution, only loops.
>
> Am I right?
>
> Thanks,
> Slava

_______________________________________________
ncl-talk mailing list
List instructions, subscriber options, unsubscribe:
http://mailman.ucar.edu/mailman/listinfo/ncl-talk
Received on Wed Mar 18 2009 - 05:12:28 MDT

This archive was generated by hypermail 2.2.0 : Wed Mar 18 2009 - 14:50:21 MDT