#120 Improved statistics calculation performance

ادغام شده
jens.klein 12 کامیت ادغام شده از stefan.schmidt/stat_perf به SPIN/master 6 سال پیش

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time.

Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM.

I verified that this change doesn't alter the content of the generated statistics files.

This should solve #119, although there probably are more performance improvements possible.

/edit:

I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect here (timestamps are weird, but the memory amounts are correct).

/edit 2:

I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time. Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM. I verified that this change doesn't alter the content of the generated statistics files. This should solve #119, although there probably are more performance improvements possible. /edit: I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect [here](https://imgur.com/a/gVjksyG) (timestamps are weird, but the memory amounts are correct). /edit 2: I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.
Carlos Garcia نظر 6 سال پیش
مالک

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.
Carlos Garcia نظر 6 سال پیش
مالک

@stefan.schmidt One small issue to address: The calculation of node degrees in statistics.cpp (as in issue #119) should only be calculated if the -t flag (for advanced statistics) is given.

As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because -t is not specified).

@stefan.schmidt One small issue to address: The calculation of node degrees in `statistics.cpp` (as in issue #119) should only be calculated if the `-t` flag (for advanced statistics) is given. As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because `-t` is not specified).
Stefan Schmidt نظر 6 سال پیش
همكار

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the -t flag if that's ok.

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the ```-t``` flag if that's ok.
Carlos Garcia نظر 6 سال پیش
مالک

I'll move them both behind the -t flag

@stefan.schmidt, yes, that is the way to go.

> I'll move them both behind the -t flag @stefan.schmidt, yes, that is the way to go.
Stefan Schmidt نظر 6 سال پیش
همكار

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.
درخواست pull request با موفقیت ادغام شد!
برای پیوستن به گفتگو، وارد شودید.
بدون برچسب
Bug
بدون نقطه عطف
بدون مسئول رسیدگی
2 مشارکت کننده
درحال بارگذاری...
لغو
ذخيره
هنوز محتوایی ایجاد نشده.