#120 Improved statistics calculation performance

Merged
jens.klein merged 12 commits from stefan.schmidt/stat_perf into SPIN/master 6 years ago

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time.

Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM.

I verified that this change doesn't alter the content of the generated statistics files.

This should solve #119, although there probably are more performance improvements possible.

/edit:

I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect here (timestamps are weird, but the memory amounts are correct).

/edit 2:

I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time. Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM. I verified that this change doesn't alter the content of the generated statistics files. This should solve #119, although there probably are more performance improvements possible. /edit: I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect [here](https://imgur.com/a/gVjksyG) (timestamps are weird, but the memory amounts are correct). /edit 2: I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.
Carlos Garcia commented 6 years ago
Owner

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.
Carlos Garcia commented 6 years ago
Owner

@stefan.schmidt One small issue to address: The calculation of node degrees in statistics.cpp (as in issue #119) should only be calculated if the -t flag (for advanced statistics) is given.

As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because -t is not specified).

@stefan.schmidt One small issue to address: The calculation of node degrees in `statistics.cpp` (as in issue #119) should only be calculated if the `-t` flag (for advanced statistics) is given. As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because `-t` is not specified).
Stefan Schmidt commented 6 years ago
Collaborator

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the -t flag if that's ok.

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the ```-t``` flag if that's ok.
Carlos Garcia commented 6 years ago
Owner

I'll move them both behind the -t flag

@stefan.schmidt, yes, that is the way to go.

> I'll move them both behind the -t flag @stefan.schmidt, yes, that is the way to go.
Stefan Schmidt commented 6 years ago
Collaborator

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.
This pull request has been merged successfully!
Sign in to join this conversation.
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.