Storing GPS Coordinates - MySQL, Spatial, Flat Files

posted by sacah on project, benchmark,

I'm working on a new project that will require storing peoples location, and allowing others to browse a map and see those peoples locations. In order to test which would be the best solution I wrote a PHP script to generate random clusters of GPS coordinates and store them in the tables.

The two tables I used were MySQL tables, one with an id(int), lat(decimal) and lon(decimal) columns and the second with an id(int) and coords(Point) columns. The script would save the same generated point in both tables so I could be assured the same data was in both. After generating 118 million coordinates the spatial table was double the file size of the decimal lat/lon table.

Next I cleared the data and created indexes on the tables, the first one had an index on lat and lon, the second had an index on coords.

Due to the speed issues on the Spatial index I only ran the GPS generator on one table at a time, first on the lon/lat table, it generated 604,734,087 records, weighing in at 25.3Gb(45bytes per record). Starting at 30,000 inserts per second, then as more records were inserted speed decreased as you can see in the graph below, though you can see only 3 times did it dip below 5000 inserts per second. Y axis contains the number of records, X axis is the number of inserts per second.


Next I ran the generator on the spatial table, it generated 17,160,836 records, weighing 1.6Gb(100bytes per record). This table started at 26,000 inserts per second, and quickly dipped to 39 inserts per second at the lowest point.

I was a bit surprised at the difference, though figuring the Spatial one would do much better in queries that the insert speed and filesize didn't bother me too much. I ran the following query on each table.
SELECT id FROM latlon WHERE (lon>7.1 && lon<8.1) && (lat>12.2 && lat<13.2)
SELECT id FROM spatial WHERE (X(coords)>7.1 && X(coords)<8.1) && (Y(coords)>12.2 && Y(coords)<13.2)

latlon query took 1.35 seconds, spatial query took 44.2 seconds, and remember the spatial table only has 17 million records compared to the 600 million in the latlon table. Very surprising results, I under stand the cool things you can do with selecting points that intersect a line, or are contained within a shape etc with the spatial tables, but given that our users will always be browsing via a rectangle map, a simple query like above will do fine.

If anyone has any ideas about things I might have done wrong with my spatial setup to result in it being drastically slower please let me know.

Next I thought I'd try flat files. I'll post more over the coming days about my tests with them.