-
Notifications
You must be signed in to change notification settings - Fork 62
WIP: Improve scantime #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
lon <= {lon_end} | ||
GROUP BY spawn_id; | ||
'''.format( | ||
lat_end=config.MAP_START[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to edit this to min(start[0],end[0]) and so on. In case the rectangle is defined the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had forgotten about that. Had to switch start and end as it was the other way round for me and wanted to generalize it for others but looks like i didn't :s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting error in db.py
import db
Traceback (most recent call last):
File "", line 1, in
File "db.py", line 449
return results
SyntaxError: 'return' outside function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats weird. Which python version are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.7.6
Dont have any errors on my backup server that has basics without this pull added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And which file did you try to execute? Could you attach the full traceback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure how to add a full traceback, still learning a lot.
im getting that error when i go to import the db on a clean scanner server setup. using these commands
python -i
import db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried my best but I can't reproduce this error. May I ask how applied this pull request?
distribute points among worker
Would this mean a reduction of requests sent to Niantic Servers ? |
That depends on the user. If you don't adjust the grid parameter in the config file it will still make more or less the same amount of requests but each point will be scanned more often. As there are fewer points to scan(500 instead of 1500 for me),you could use fewer accounts to scan the same area, or keep the number of accounts and scan a larger area. Or don't change anything at all and have a higher refresh rate :) Right now I'm working with only 2/3 of the workers i did without these changes but also reduced the workload for every worker. |
Hello @rollator , |
Oh yes, sorry I should have mentioned that:
|
With my data set, Im getting |
@vDrag0n I just added a small commit which could fix it for you, but I'm not really sure. Could you try it with the new version and tell if it works now and what your number of spawnpoints was? (Will be printed in the console before the most ressource intensive part, so you'll have plenty of time to take note :) |
|
lat <= {lat_end} AND | ||
lon >= {lon_start} AND | ||
lon <= {lon_end} | ||
GROUP BY spawn_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont know if spawn_id is unique. Better GROUP BY lat,lon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually spawn_id is unique, but I can change it anyway as it shouldn't really make a difference :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
267k records in one db, 562k in the other. 505 duplicate spawn ids.
Source: https://www.reddit.com/r/pokemongodev/comments/4vazmk/encounter_id_is_not_unique/
Title says encounter_id but really is spawn_id if you read it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I'm not really conviced he's talking about the id of spawn points. He only mentions "spawn id" once and from the context it seems like he's talking about the encounter_id field.
In our db, spawn_id is spawn point id, not encounter_id, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, we dont save the encounter id at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do save encounter_id on the sightings table.
Thanks for the styling tips but as as I already stated in very first post, I'm very much aware that point code quality is seriously lacking at this point. It's a prototype and more of a POC rather than code I'd actually want to be merged. As the title suggests, this PR is still very much Work In Progress and nowhere near production quality. |
Using your spawn point retrieval sql, i'm pulling 26k records over a 1.2million row database. I've tried changing the recursive limits of python, and just setting the SAMPLES_PER_POINT to 1 edit: |
that did the trick? great, thank you for testing :D The long runtime isn't really unexpected, as the calculations are really computational intensive and calculating distances on earth takes quite some time compared to a flat plane :/ |
It all depends on how dense your spawn points are and where you're scanning. Since Im scanning a large area with parks, schools, and inlets, Im saving about 50%. For the speed problem, Its only using one core. Im sure it could be quicker if you just multi threaded it. |
Exactly that's why I'm asking, so we can have a rough estimate which doesn't only depend on 1 scan location As for the speed problem, that's definitly something I'll be looking into |
tested this: |
what's your SAMPLES_PER_POINT value? |
@gunawanputra right now you won't any information about the location of a pokemon outside the 70m radius, so you still needed to scan all the points within the cell when you know that there are pokemons |
@rollator update: turns out the way python performs the map, it is ~1h including the treecreation. Will need to perform some more debugging where it runs all the other hours |
@Aiyubi I would guess that the 3 hours are in that huge horrible |
Sooo it finished with samples = 1 |
@Aiyubi could you test it again please? It should run MUCH faster now which could enable you increase the sample size and decrease the number of optimized scanpoints |
I have over 300k db. Will be trying this soon. |
@theresthatguy I wasn't talking db entries, I was talking spawnpoints. My db contains ~350k entries, so we're pretty similar in that sence, so I guess it should run no more than a few seconds for you and you can even set the SAMPLES_PER_POINT to a higher value(~40 was fine for me) to achieve better results :) |
|
I've tested it and it works much quicker after the refactor. Only the checking of spawn points takes a long time. |
@Aiyubi what python & scipy version are you using? This seems like it's an older scipy version @vDrag0n by checking of spawn points you mean the "safety check"? That part will be removed asap, however I first wanted confirmation by other testers that the part where it print 'lonely circle found' never gets executed. |
about 5mins with 10 samples before the safety check. |
you're right. Added commit to log results to minimal_scan_calculations.log |
@rollator you were right. Did pip for the wrong python version and had an old scipy
|
thanks for the feedback. About the gyms, I completely forgot about them, will add them soon. |
So I added spawnpoints to my map ( #241 ). What I noticed is that there are still some circles that could be optimized away. The middle one has no unique points and could be optimized away. I wonder if espresso algorithm could be used to minimize this. (Since Quine mccluskey has exponential runtime) |
Do we need to consider gyms though? It seems to me that workers find gyms in quite a huge radius as it is, so for a gym to be missed there has to be a huge void of spawn points surrounding it. Is that ever the case? Aren't gyms "defined" to be places where "a lot is going on"? |
It wouldn't be much of a hassle to add them, it's just a manner of adding some columns to the matrix. @Aiyubi I can't really tell as I don't know this algorithm and don't find adequate ressources to change that. Perhaps if you could help me out there I certainly could try it but I guess that would have to wait ~2 weeks as I just don't have the time right know :/ |
i decided to give this a go today, and its a no go, i tried with a database from both 0.5.4 and one from 0.5.2 and i get different errors with both, any ideas on something i can try? Number of spawnpoints: 0 |
@nhumber try it with python 3 |
im stuck on 2.7, ive tried to get my vps up to 3.5 but i seem to mess it up everytime and have to revert. |
With 5763 spawn points, safety check seemed to take quite a long time, but "A sad lonely circle has been found :(" was not printed, so if nobody else is getting it, I think it'd be safe to remove. Awesomely faster though now! |
@nhumber virtualenvwrapper is pretty easy to setup (https://virtualenvwrapper.readthedocs.io/en/latest/) |
thank you for the suggestion, i searched around and found this guide and followed it: http://blog.dwaynecrooks.com/post/136668432442/installing-python-351-from-source-on-ubuntu I got up and running and away from python 2.7 prior to the big co routines update!!, if a newb like me can do it anyone can ;-) edit ** looks like i spoke too soon, i have python 3.5 working in a env but now im getting this error (env) user@MAIN:~/maptest/pm$ python worker.py i see above mention of scipy version, but it seems im already up to date. any ideas i could have missed? also I run the project in combination with a twitter bot to notify on rare/ultra rare spawns, first for the notifcations and secondly for data collection with the hopes that at some point we can get to the point of predicting the spawns rather than scanning for them. this is slightly OT but has anyone tried narrowing down spawn points to remove points that never spawn anything rare? i realize this is not the scope of the project but, Do we know if certain spawn points NEVER spawn anything rare/good ? if so can those points just be eliminated? i was thinking about sorting the db by spawn point location and then again by pokemon ID and removing any points that never spawn anything but common pokemon. Good idea or bad idea? |
Update 1.2
Reworked distance calculations, should be seriously faster now. To achieve that I had to add
scipy
andnumpy
andpyproj
as dependencies.Would appreciate if some folks with bigger datasets than me(>2k spawn points) could test it.
Update 1.1
Did some refactoring to improve code quality and fixed something which may very well have lead to an infinite loop.
Update 1
What do you need to run the PR at this stage?
SAMPLES_PER_POINT = 5
. Increasing the value increases the computation time needed before we can start, but results in a smaller set of scanning points. However increasing it beyond 10 won't improve the result significantly.Will it reduce server request?
No, not unless you reduce the grid parameters in the config file. You can expect a decrease of points to scan of about ~50%, so you could use 50% less workers to get the same results as before which in turn would lead to fewer server request.
Still missing:
code quality is seriously lacking, will be fixed soon.fixedTL;DR: set
SAMPLES_PER_POINT = 5
in config.py, have area prescanned. By using these changes you will have to do 50% fewer scans.edit: I just realized I was talking bullshit about the efficiency gain. It wasn't 30% more effective for me. It reduced the scanning points to 30%, so the speedup is way higher. Corrected the corresponding passages.
Reduce the scan time by calculating a set of scan points which covers all distinct spawns. This leads to a reduced amount of points to scan without missing a single pokemon.
As the title states, it's still work in progress. What's missing at this point: