#120 Improved statistics calculation performance

已合并
jens.klein 6 年之前 将 12 次代码提交从 stefan.schmidt/stat_perf合并至 SPIN/master

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time.

Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM.

I verified that this change doesn't alter the content of the generated statistics files.

This should solve #119, although there probably are more performance improvements possible.

/edit:

I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect here (timestamps are weird, but the memory amounts are correct).

/edit 2:

I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.

This replaces the combination of std::vector and std::find in statistics.cpp with an std::unordered_set, which has O(1) lookup time. Previously, it took ~10 hours to process the first ~30% of the file given in #118, now it takes around four minutes. It now takes my system less than thirteen minutes to process the whole file. However, ID2T then crashes because it runs out of RAM. I verified that this change doesn't alter the content of the generated statistics files. This should solve #119, although there probably are more performance improvements possible. /edit: I've added another commit which reduces the maximum amount of memory required while saving the statistics file by eliminating unnecessary copies. This doesn't change the required time much, but significantly brings down memory usage. I've uploaded screenshots demonstrating the effect [here](https://imgur.com/a/gVjksyG) (timestamps are weird, but the memory amounts are correct). /edit 2: I added a few more commits to reduce pass-by-value calls and remove some warnings, as well as a small bugfix.
Carlos Garcia 评论于 6 年之前
所有者

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.

Stefan, the changes are looking great. Those were indeed some of the biggest problems we were observing.
Carlos Garcia 评论于 6 年之前
所有者

@stefan.schmidt One small issue to address: The calculation of node degrees in statistics.cpp (as in issue #119) should only be calculated if the -t flag (for advanced statistics) is given.

As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because -t is not specified).

@stefan.schmidt One small issue to address: The calculation of node degrees in `statistics.cpp` (as in issue #119) should only be calculated if the `-t` flag (for advanced statistics) is given. As of now, the in-degree (and out-degree) statistics are always calculated. I might be mistaken, but the degree statistics are being calculated but then ignored and not stored in the databse (because `-t` is not specified).
Stefan Schmidt 评论于 6 年之前
协作者

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the -t flag if that's ok.

@carlos.garcia It seems that the degree-info is stored in the database, but in a separate table (ip_degrees). But, like conv_statistics_extended, this table is never used in ID2T apart from plotting, so I'll move them both behind the ```-t``` flag if that's ok.
Carlos Garcia 评论于 6 年之前
所有者

I'll move them both behind the -t flag

@stefan.schmidt, yes, that is the way to go.

> I'll move them both behind the -t flag @stefan.schmidt, yes, that is the way to go.
Stefan Schmidt 评论于 6 年之前
协作者

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.

@carlos.garcia I added commits implementing the changes, and also a fix for issue #117.
该合并请求已经成功合并!
登录 并参与到对话中。
未选择标签
Bug
未选择里程碑
未指派成员
2 名参与者
正在加载...
取消
保存
这个人很懒,什么都没留下。