#120 Improved statistics calculation performance

Samengevoegd
jens.klein heeft 12 commits samengevoegd van stefan.schmidt/stat_perf naar SPIN/master 6 jaren geleden

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time.

Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM.

I verified that this change doesn't alter the content of the generated statistics files.

This should solve #119, although there probably are more performance improvements possible.

/edit:

I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect here (timestamps are weird, but the memory amounts are correct).

/edit 2:

I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time. Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM. I verified that this change doesn't alter the content of the generated statistics files. This should solve #119, although there probably are more performance improvements possible. /edit: I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect [here](https://imgur.com/a/gVjksyG) (timestamps are weird, but the memory amounts are correct). /edit 2: I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.
Carlos Garcia reageerde 6 jaren geleden
Eigenaar

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.
Carlos Garcia reageerde 6 jaren geleden
Eigenaar

@stefan.schmidt One small issue to address: The calculation of node degrees in statistics.cpp (as in issue #119) should only be calculated if the -t flag (for advanced statistics) is given.

As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because -t is not specified).

@stefan.schmidt One small issue to address: The calculation of node degrees in `statistics.cpp` (as in issue #119) should only be calculated if the `-t` flag (for advanced statistics) is given. As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because `-t` is not specified).
Stefan Schmidt reageerde 6 jaren geleden
Medewerker

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the -t flag if that's ok.

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the ```-t``` flag if that's ok.
Carlos Garcia reageerde 6 jaren geleden
Eigenaar

I'll move them both behind the -t flag

@stefan.schmidt, yes, that is the way to go.

> I'll move them both behind the -t flag @stefan.schmidt, yes, that is the way to go.
Stefan Schmidt reageerde 6 jaren geleden
Medewerker

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.
Dit pull-request is samengevoegd!
Sign in to join this conversation.
Geen mijlpaal
Geen verantwoordelijke
2 deelnemers
Laden...
Annuleren
Opslaan
Er is nog geen inhoud.