Gonzalo Martinez: Aprendiendo Erlang parte 6 Modulos II

Una última función agregada al modulo, usando ambas funciones anteriores

greet_and_add_two(X) ->
    hello(),
    add(X,2).

No olvides agregar greet_and_add_two/1 a la lista de funciones exportadas. En las llamadas a hello/0 y add/2 no necesitas escribir el nombre del modulo delante de ellos por que son declaradas en el módulo mismo.

Si hubieras querido ser capaz de llamar a io:format/1 en la misma manera que add/2 o cualquier otra función definida en el módulo, deberías agregar el siguiente atributo de modulo al comienzo del archivo -import(io, [format/1]). Entonces podrías llamar a format('Hola Mundo!~n'). directamente. De manera más general puede seguir esta receta.

-import(Module, [Funcion1/Aridad, ..., FuncionN/Aridad]).

Importar una función no es más que un atajo para los programadores cuando escriben su código. Los programadores Erlang a menudo desalientan el uso del atributo -import ya que algunas personas encuentran que reduce la legibilidad del código. En el caso de io:format/2 la función io_lib:format/2 también existe. En caso de que se use una de estas el programador tendría que ir al comienzo del archivo para saber de cual de las dos se trata. Consecuentemente, dejar el nombre de módulo es considerada una buena práctica. Usualmente, las únicas funciones que verás importadas vienen del módulo de listas: estas funciones son usadas con mucha frecuencia  que las otros módulos.

Tu módulo useless debería ahora verse algo así

-module(useless).
-export([add/2, hello/0, greet_and_add_two/1]).

add(A,B) ->
    A + B.

%%
%%
hello() ->
    io:format("Hola mundo!~n").

greet_and_add_two(X) ->
    hello(),
    add(X,2).

Hemos terminado con el módulo 'useless'. Puedes guardar el archivo bajo el nombre userless.erl . El nombres del archivo deberá el nombre del módulo como fue definido en el atributo -module, seguido de '.erl' que es el tipo de extensión standard para Erlang.

Anteriormente vimos como compilar el módulo y finalmente intentar todas sus funciones, veremos como definir y usar macros. Las macros de Erlang son realmente similares a las declaraciones '#define' de C, principalmente usado para definir funciones cortas y constantes. Ellas son expresiones simples representadas por texto que será reemplazado antes de que el código sea compilado por la VM. Tales macros son útiles principalmente para evitar valores mágicos flotando alrededor de tus módulos. Una macro es definida como un atribudo módulo de la forma -define(MACRO, some_value).  y es usada como ?MACRO dentro de cualquier función definida en el módulo. Una 'función' macro debería escribirse como -define(sub(X,Y), X-Y). y usada como ?sub(23,47) luego reemplazado por 23-47 por el compilador. Algunas personas usarán macros más complejas, pero la sintaxis básica se mantiene igual.

Compilando el código

El código Erlang es compilado a bytecode para ser usado por la máquina virtual. Puedes llamar al compilador desde distintos lugares $ erlc flags file.erl en la linea de comandos, compile:file(FileName) en la shell o en un módulo c() en la shell, etc.

Es tiempo de compilar nuestro módulo useless. Abrir la Shell de Erlang y escribir lo siguiente.

1> cd("/path/to/where/you/saved/the-module").
"Path name to the directory you are in"
ok

De manera predeterminada, la shell solo busca archivos en el mismo directorio que esté fue lanzado y en la libreria estandar: cd/1 es una función definida exclusivamente para la Shell de Erlang, diciendole que cambie el directorio uno nuevo por lo que es menos molesto navegar por nuestros archivos. Los usuarios de Windows deberían recordar de usar la barra invertida. Cuando esto se realice haz lo siguiente.

2> c(useless).
{ok, useless}

Si obtienes otro mensaje, asegurate que el nombre del archivo es correcto, que estás en el directorio correcto, y que no tienes errores en tu módulo. Una vez que compiles el código exitosamente, te darás cuenta que un archivo useless.beam fue agregado en el mismo directorio que tu useless.erl. Este es el módulo compilado. Probemos nuestras funciones.

3> useless:add(7,2).
9
4> useless:hello().
Hello, World!
ok
5> useless:greet_and_add_two(-3).
Hello, World!
-1
6> useless:not_a_real_function().
** exception error: undefined function useless:not_a_real_function/0

Las funciones funcionando como esperamos add/2 agrega números, hello/0 muestra por pantalla "Hello World!", y greet_and_add_two/1 hace ambas cosas. Por supuesto, te preguntarás por que hello/0 retorna el atomo ok luego del texto saliente. Esto es por que las funciones Erlang y expresiones deben siempre returnar algo, incluso si ellas no son necesarias en otros lenguajes. Así como, io:format/1 retorna 'ok' para denotar una condición normal, la ausencia de errores.

La expresión 6 muestra un error siendo lanzado por que una función no existe. Si te olvidás de expotar una función, este es el tipo de mensaje de error que obtendrás cuando lo ejecutes.

Hay un montón de banderas de compilación existentes para tenes más control sobre como es compilado el módulo.  Puedes obtener una lista de todos ellos en la documentación de Erlang [0] . Los más comunes son.

-debug_info
Las herramientas de Erlang como debuggers, cobertura de código, y herramientas de analisis estático se usan para la información de depuración de un módulo con el fin de realizar su trabajo.

-{outdir, Dir}
Por default, el compilador de Erlang creará los archivos 'beam' en el directorio actual. Este te permitirá elegir donde poner el archivo compilado.

-export_all
Ignorará el atributo -export del módulo y en su lugar exportará todas las funciones definidas. Este es principalmente útil cuando estás probando o desarrollando código nuevo, pero no debería ser usado en producción.

-{d, Macro} or {d,Macro,Value}
Define una macro a ser usada en el módulo, donde Macro es un atomo. Este es usado más frecuentemente cuando se trata de pruebas unitarias, lo que garantiza que un módulo solo tendrá sus funciones de prueba creadas y exportadas cuando se quieren explicitamente. Por default, Value es 'true' si esta no es definida en el tercer lugar de la tupla.

Para compilar nuestro módulo useless con algunas banderas, deberíamos hacer lo siguiente:

7> compile:file(useless, [debug_info, export_all]).
{ok, useless}
8> c(useless, [debug_info, export_all]).
{ok, useless}

Puedes tambien ser astuto y definir opciones del compilador dentro de un módulo con un atributo de módulo:

-compile([debug_info, export_all]).

Entonces solo compilará y obtendrás los mismos resultados que si pasaras las banderas manualmente. Ahora estamos listos para escribir funciones, compilarlas, y ejecutarlas. Es el momento de ver hasta donde podemos llevarlo.

Tarde y de a poco sigo traduciendo, como salga, pero aprendiendo algo cada día. [1]
[0] http://erlang.org/doc/man/compile.html
[1] http://learnyousomeerlang.com/modules

Marcelo Fernández: FLISOL 2014 en Luján

El grupo de usuarios de Software Libre de la Universidad de Luján –UNLUX- invita a toda la comunidad a participar de la edición 2014 del FLISOL – Festival Latinoamericano de Instalación de Software Libre en la ciudad de Luján, a llevarse a cabo el día sábado 26 de abril, en concordancia con numerosas ciudades de Argentina y el continente. Las actividades se desarrollaran en la Sede Central de la Universidad Nacional de Luján (UNLu) a partir de las 13:00 hs.

Al igual que en ediciones anteriores, los integrantes del grupo instalarán Software Libre (GNU/Linux, Firefox, etc.) de forma gratuita y totalmente legal en los equipos informáticos que los asistentes acerquen al encuentro. Durante la ejecución del mismo se ofrecerán charlas informativas y técnicas sobre diferentes aspectos relacionados con el Software Libre.

Los invitamos a participar acercando sus equipos tanto para la instalación de Software Libre, la resolución de problemas sobre instalaciones existentes o simplemente para participar de una jornada distinta para intercambiar experiencias sobre Software Libre o compartir una tarde en un ambiente agradable.

Sobre el FLISOL

FLISOL 2014 en Luján
El Festival Latinoamericano de Instalación de Software Libre es un evento que se viene desarrollando de forma anual desde hace casi una década, donde se promueve el trabajo colaborativo ayudando a personas a conocer el mundo del Software Libre.
Está organizado por varios grupos de usuarios de los países involucrados congregados alrededor de esta iniciativa que reúne participantes de Argentina, Bolivia, Brasil, Chile, Colombia, Cuba, Ecuador, El Salvador, Guatemala, Honduras, México, Nicaragua, Panamá, Paraguay, Perú, Uruguay y Venezuela, entre otros. En Argentina ya está confirmada la realización de FLISOL en distintas ciudades.

El encuentro está dirigido a todas las personas que desean conocer más sobre software libre, instalarlo y usar sus computadoras preservando sus libertades, en condiciones de legalidad y sin estar preocupados por virus y otros problemas comunes del software privativo. Durante la jornada, se realizan instalaciones en forma totalmente gratuita, mientras que en paralelo se ofrecen diversas charlas de divulgación para promover el uso y la filosofía del Software Libre.

Cronograma de Charlas

El cronograma de charlas se está actualizando constantemente. Revisa los sitios flisol.info/FLISOL2014/Argentina/Lujan o www.unlux.com.ar para estar al tanto de las novedades.

Inscripción

Llenando este formulario, nos ayudas a estimar la asistencia de público y preparar mejor los espacios disponibles:
http://eventosimple.net/event/sp/publication/flisol-unlu/register

Sin embargo, la entrada es libre y gratuita, no es necesario registrarse para asistir a las charlas. Si pensás traer un equipo para instalar o configurar, por favor registrate y lee detenidamente las aclaraciones y datos al respecto.

Sobre el UNLUX

El UNLUX es un grupo de usuarios y entusiastas del Software Libre, que existe desde el 2005 y realiza sus actividades en el marco de la Universidad Nacional de Luján. Participa en el FLISOL desde el 2006, instalando y difundiendo diferentes ventajas de usar el Software Libre en el ámbito Educativo, Social, Técnico, Profesional y Personal.

Para conocer las vías de comunicación, podés empezar por www.unlux.com.ar y suscribirte a la lista de correo y el canal IRC.

En agenda

Diego Sarmentero: Testing QPlainTextEdit and QTextEdit scrolling

I was looking for a way to implement smooth scrolling in a QPlainTextEdit component because the per line scrolling looks awful... and i find out that it seems that that kind of scrolling is not supported in QPlainTextEdit, but it's the default kind of scrolling in QTextEdit, which for my surprise wasn't a specialization of QPlainTextEdit.

QPlainTextEdit version code: http://linkode.org/IYhVW93lpgeEahMDknzfe5

QTextEdit version code: http://linkode.org/xSUXB2BtJ2oFRFshlELyJ


And here is a video of how the scrolling differs between each one:


Diego Sarmentero: Series que me gustaron de principio a fin

Motivado por la gigantesca decepción que fue el final de "How I Met Your Mother", se me ocurrió hacer una lista de Series que me gustaron de principio a fin y que el final fue algo digno de ver.

(WARNING: Esta lista es totalmente subjetiva a mi persona :P... y NO CONTIENE SPOILERS, pueden leer tranquilos quienes todavía no vieron las series)

Battlestar Galactica

Galactica no solo me parece el mejor final que vi, sino la mejor serie de todas POR LEJOS! Ya la vi varias veces y el progreso de todas las temporadas y la culminación en el episodio final es lo mejor que hay para mi.



Breaking Bad

Creo que no debe haber nadie que no haya escuchado de lo gloriosa que fue esta serie, y viendo la forma en que fueron evolucionando los personajes en cada temporada, y los caminos que toma la historia, es algo que da gusto de ver lo bien pensado que esta.



Fringe

Esta serie también me gusto muchísimo, y aunque empieza con temas bastante genéricos, en cuanto presentan la historia central de la serie todo se va volviendo cada vez mas atrapante, y disfrute todos los capítulos de principio a fin de la serie.



The Office

Los primeros capítulos de esta serie te pueden producir hasta vergüenza ajena al verlos, el personaje de Michael te pone incomodo como espectador, pero la verdad que después es tremendo como todas las historias y los personajes te terminan comprando, y si bien en una de las ultimas temporadas paso algo que me pareció que iba a arruinar la serie, supieron manejarlo, y el final fue realmente genial.



Dollhouse

Esta serie empieza con capítulos entretenidos, pero un tanto "pasajeros", podría decirse, te parece estar viendo una seria que esta buena, pero es una serie mas para pasar el rato, y en un punto hace un giro que la historia se vuelve realmente atrapante, y te quedas esperando capitulo tras capitulo para ver como se van a resolver determinadas situaciones, y el final me sorprendió en muchos aspectos.



Angel

Esta serie me gusto mucho, porque supo tener drama, aventura, comedia, todo! Y para describir el final solo se me viene a la mente la palabra "epico", que es el unico final posible que podrias esperar de esta serie.
Si bien la serie es excelente de principio a fin, y el ultimo episodio es el final que se merece una serie así, después salieron una serie de comics por quienes quieran seguir explorando que sucedió con los personajes después, etc.




Otras series que estoy mirando y todavía no terminaron, pero que les tengo MUCHA FE de que van a mantener el mismo nivel que viene teniendo toda la serie y brindar un final muy copado son:
  • The Newsroom
  • Sherlock
  • Banshee
  • White Collar

Para seguir mis series uso: http://tvstalker.tv

Diego Sarmentero: PyDay Cordoba 2014 - Kickoff Organizacional

Buenas!
Para aquella gente de Córdoba (Argentina) interesada en participar en la
organización del PyDay Córdoba 2014, el cual la idea es realizar los
primeros días de Agosto...HE AQUÍ EL MENSAJE QUE TANTO ESTABAN
ESPERANDO!! :P

Nos vamos a juntar el Viernes 11 (de Abril, la semana que viene), a
las 20hs en AlfonsinaII (Belgrano 763 - Casa Tomada), para tener la
primera reunión, ver quienes son los interesados en formar parte del
equipo, ver las tareas necesarias y quien quiere tomarlas, etc, etc.

Así que si siempre soñaste con la gloria, la fama y pasar a la
inmortalidad como uno de los organizadores del PyDay de Córdoba, ESTA
ES TU OPORTUNIDAD!! (promoción no valida para ningún ser vivo).

Los esperamos!

Damián Avila: Slideviewer: a simple way to share your IPython slides

Short Notice:

After some months of silence, I am back... A lot of things have happened in my life lately, some of them are really good things... some others don't. And all those things kept me very busy, but finally I have made me some time to write again! ;-)

OK, I have a long list of news, topics, ideas and developments I want to share with you, but we have to begin with one of them, and the chosen one is Slideviewer.

Read more… (2 min remaining to read)

Marcos Dione: appending-osm-data-with-flat-nodes

First: one thing I didn't do in previous post was to show the final tables and sizes. Here it is:

 Schema |        Name        | Type  | Owner  |  Size   | Description
--------+--------------------+-------+--------+---------+-------------
 public | geography_columns  | view  | mdione | 0 bytes |
 public | geometry_columns   | view  | mdione | 0 bytes |
 public | planet_osm_line    | table | mdione | 11 GB   |
 public | planet_osm_point   | table | mdione | 2181 MB |
 public | planet_osm_polygon | table | mdione | 23 GB   |
 public | planet_osm_roads   | table | mdione | 2129 MB |
 public | raster_columns     | view  | mdione | 0 bytes |
 public | raster_overviews   | view  | mdione | 0 bytes |
 public | spatial_ref_sys    | table | mdione | 3216 kB |

 Schema |            Name             | Type  | Owner  |       Table        |  Size   | Description
--------+-----------------------------+-------+--------+--------------------+---------+-------------
 public | planet_osm_line_index       | index | mdione | planet_osm_line    | 4027 MB |
 public | planet_osm_point_index      | index | mdione | planet_osm_point   | 1491 MB |
 public | planet_osm_point_population | index | mdione | planet_osm_point   | 566 MB  |
 public | planet_osm_polygon_index    | index | mdione | planet_osm_polygon | 8202 MB |
 public | planet_osm_roads_index      | index | mdione | planet_osm_roads   | 355 MB  |
 public | spatial_ref_sys_pkey        | index | mdione | spatial_ref_sys    | 144 kB  |

The first thing to notice is that none of the intermediate tables are created nor their indexes, but also all the _pkey indexes are missing.

What I did in my previous post was to say that I couldn't update because the intermediate tables were missing. That was actually my fault. I didn't read carefully osm2psql's manpage, so it happens that the --drop option is not for dropping the tables before importing but for dropping the intermediate after import.

This means I had to reimport everything, and this time I made sure that I had the memory consumption log. But first, the final sizes:

 Schema |        Name        |   Type   | Owner  |    Size    | Description
--------+--------------------+----------+--------+------------+-------------
 public | contours           | table    | mdione | 21 GB      |
 public | contours_gid_seq   | sequence | mdione | 8192 bytes |
 public | geography_columns  | view     | mdione | 0 bytes    |
 public | geometry_columns   | view     | mdione | 0 bytes    |
 public | planet_osm_line    | table    | mdione | 11 GB      |
 public | planet_osm_nodes   | table    | mdione | 16 kB      |
 public | planet_osm_point   | table    | mdione | 2181 MB    |
 public | planet_osm_polygon | table    | mdione | 23 GB      |
 public | planet_osm_rels    | table    | mdione | 871 MB     |
 public | planet_osm_roads   | table    | mdione | 2129 MB    |
 public | planet_osm_ways    | table    | mdione | 42 GB      |
 public | raster_columns     | view     | mdione | 0 bytes    |
 public | raster_overviews   | view     | mdione | 0 bytes    |
 public | spatial_ref_sys    | table    | mdione | 3216 kB    |

 Schema |           Name           | Type  | Owner  |       Table        |  Size   | Description
--------+--------------------------+-------+--------+--------------------+---------+-------------
 public | contours_height          | index | mdione | contours           | 268 MB  |
 public | contours_pkey            | index | mdione | contours           | 268 MB  |
 public | contours_way_gist        | index | mdione | contours           | 1144 MB |
 public | planet_osm_line_index    | index | mdione | planet_osm_line    | 4022 MB |
 public | planet_osm_line_pkey     | index | mdione | planet_osm_line    | 748 MB  |
 public | planet_osm_nodes_pkey    | index | mdione | planet_osm_nodes   | 16 kB   |
 public | planet_osm_point_index   | index | mdione | planet_osm_point   | 1494 MB |
 public | planet_osm_point_pkey    | index | mdione | planet_osm_point   | 566 MB  |
 public | planet_osm_polygon_index | index | mdione | planet_osm_polygon | 8207 MB |
 public | planet_osm_polygon_pkey  | index | mdione | planet_osm_polygon | 1953 MB |
 public | planet_osm_rels_idx      | index | mdione | planet_osm_rels    | 16 kB   |
 public | planet_osm_rels_parts    | index | mdione | planet_osm_rels    | 671 MB  |
 public | planet_osm_rels_pkey     | index | mdione | planet_osm_rels    | 37 MB   |
 public | planet_osm_roads_index   | index | mdione | planet_osm_roads   | 358 MB  |
 public | planet_osm_roads_pkey    | index | mdione | planet_osm_roads   | 77 MB   |
 public | planet_osm_ways_idx      | index | mdione | planet_osm_ways    | 2161 MB |
 public | planet_osm_ways_nodes    | index | mdione | planet_osm_ways    | 52 GB   |
 public | planet_osm_ways_pkey     | index | mdione | planet_osm_ways    | 6922 MB |
 public | spatial_ref_sys_pkey     | index | mdione | spatial_ref_sys    | 144 kB  |

This time you'll probably notice a difference: there's this new contours table with a couple of indexes. This table contains data that I'll be using for drawing hypsometric lines (also know as contour lines) in my map. This 21GiB table contains all the data from 0 to 4000+m in 50m increments for the whole Europe and some parts of Africa and Asia, except for that above 60°, which means that Iceland, most of Scandinavia and the North of Russia is out. At that size, I think it's a bargain.

As with jburgess' data, we have the intermediate data, and quite a lot. Besides the 21GiB extra for contours, we have notably 42+52+2+7GiB for ways. In practice this means that, besides of some of my files, OSM+contour data uses almost all the 220GiB of SSD space, so I'll just move all my stuff out of the SSD :( Another alternative would be to just reimport the whole data from time to time (once a month or each time I update my rendering rules, which I plan to do based on openstreetmap-carto's releases, but not on each one of them).

During the import I logged the memory usage of the 10 more memory hungry processes in the machine with this command:

( while true; do date -R; ps ax -o rss,vsize,pid,cmd | sort -rn | head; sleep 60; done ) | tee -a mem.log

Then I massaged that file with a little bit of Python and obtained a CVS file which I graphed with LibreOffice. I tried several formats and styles, but to make things readable I only graphed the sum of all the postgres processes and osm2psql. This is the final graph:

Here you can see 4 lines, 2 for the sum of postgres and two for osm2psql. The thick lines graph the RSS for them, which is the resident, real RAM usage of that process. The correspondent thin line shows the VIRT size, which is the amount of memory malloc()'ed by the processes. As with any memory analysis under Linux, we have the problem that all the processes report also the memory used by the libraries used by them, and if there are common libraries among them, they will be reported several times. Still, for the amounts of memory we're talking about here, we can say it's negligible against the memory used by the data.

In the graph we can clearly see the three phases of the import: first filling up the intermediate tables, then the real data tables themselves, then the indexing. The weird curve we can see in the middle phase for osm2psql can be due to unused memory being swapped out. Unluckily I didn't log the memory/swap usage to support this theory, so I'll have it in account for the next run, if there is one. In any case, the peak at the end of the second phase seems to also support the idea.

One thing that surprises me is the real amount of memory used by osm2psql. I told him to use 2GiB for cache, but at its peak, it uses 3 times that amount, and all the time it has another 2GiB requested to the kernel. The middle phase is also hard on postgres, but it doesn't take that much during indexing; luckily, at that moment osm2psql has released everything, so most of the RAM is used as kernel cache.

13 paragraphs later, I finally write about the reason of this post, updating the database with daily diffs. As I already mentioned, the data as imported almost took all the space available, so I was very sensitive about the amount of space used by them. But first to the sizes and times.

The file 362.osc.gz, provided by Geofabrik as the diff for Europe for Mar05 weights almost 25MiB, but it's compressed XML inside. Luckily osm2psql can read them directly. Here's the summary of the update:

$ osm2pgsql --append --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 --unlogged --bbox -11.9531,34.6694,29.8828,58.8819 362.osc.gz
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
Maximum node in persistent node cache: 2701131775
Mid: pgsql, scale=100 cache=2048

Reading in file: 362.osc.gz
Processing: Node(882k 3.7k/s) Way(156k 0.65k/s) Relation(5252 25.50/s)  parse time: 688s [11m28]

Node stats: total(882823), max(2701909278) in 240s [4m00]
Way stats: total(156832), max(264525413) in 242s [4m02]
Relation stats: total(5252), max(3554649) in 206s [3m26]

Going over pending ways...
Maximum node in persistent node cache: 2701910015
        122396 ways are pending

Using 4 helper-processes
Process 3 finished processing 30599 ways in 305 sec [5m05]
Process 2 finished processing 30599 ways in 305 sec
Process 1 finished processing 30599 ways in 305 sec
Process 0 finished processing 30599 ways in 305 sec
122396 Pending ways took 307s at a rate of 398.68/s [5m07]

Going over pending relations...
Maximum node in persistent node cache: 2701910015
        9432 relations are pending

Using 4 helper-processes
Process 3 finished processing 2358 relations in 795 sec [13m15]
Process 0 finished processing 2358 relations in 795 sec
Process 1 finished processing 2358 relations in 795 sec
Process 2 finished processing 2358 relations in 810 sec [13m30]
9432 Pending relations took 810s at a rate of 11.64/s

node cache: stored: 675450(100.00%), storage efficiency: 61.42% (dense blocks: 494, sparse nodes: 296964), hit rate: 5.12%

Osm2pgsql took 1805s overall [30m05]

This time is in the order of minutes instead of hours, but still, ~30m for only 25MiB seems a little bit too much. If I process the diff files daily, it would take ~15h a month to do it, but spread in ~30m stretches on each day. Also, that particular file was one of the smallest I have (between Mar03 and Mar17); most of the rest are above 30MiB, up to 38MiB for Mar15 and 17 each. Given the space problems that this causes, I might as well import before each rerender. Another thing to note is that the cache is quite useless, falling from ~20% to ~5% hit rate. I could try with lower caches too. The processing speeds are awfully smaller than at import time, but the small amount of data is the prevailing here.

Sizes:

 Schema |        Name        |   Type   | Owner  |    Size    | Description
--------+--------------------+----------+--------+------------+-------------
 public | contours           | table    | mdione | 21 GB      |
 public | contours_gid_seq   | sequence | mdione | 8192 bytes |
 public | geography_columns  | view     | mdione | 0 bytes    |
 public | geometry_columns   | view     | mdione | 0 bytes    |
 public | planet_osm_line    | table    | mdione | 11 GB      |
 public | planet_osm_nodes   | table    | mdione | 16 kB      |
 public | planet_osm_point   | table    | mdione | 2184 MB    |
 public | planet_osm_polygon | table    | mdione | 23 GB      |
 public | planet_osm_rels    | table    | mdione | 892 MB     |
 public | planet_osm_roads   | table    | mdione | 2174 MB    |
 public | planet_osm_ways    | table    | mdione | 42 GB      |
 public | raster_columns     | view     | mdione | 0 bytes    |
 public | raster_overviews   | view     | mdione | 0 bytes    |
 public | spatial_ref_sys    | table    | mdione | 3224 kB    |

 Schema |           Name           | Type  | Owner  |       Table        |  Size   | Description
--------+--------------------------+-------+--------+--------------------+---------+-------------
 public | contours_height          | index | mdione | contours           | 268 MB  |
 public | contours_pkey            | index | mdione | contours           | 268 MB  |
 public | contours_way_gist        | index | mdione | contours           | 1144 MB |
 public | planet_osm_line_index    | index | mdione | planet_osm_line    | 4024 MB |
 public | planet_osm_line_pkey     | index | mdione | planet_osm_line    | 756 MB  |
 public | planet_osm_nodes_pkey    | index | mdione | planet_osm_nodes   | 16 kB   |
 public | planet_osm_point_index   | index | mdione | planet_osm_point   | 1494 MB |
 public | planet_osm_point_pkey    | index | mdione | planet_osm_point   | 566 MB  |
 public | planet_osm_polygon_index | index | mdione | planet_osm_polygon | 8210 MB |
 public | planet_osm_polygon_pkey  | index | mdione | planet_osm_polygon | 1955 MB |
 public | planet_osm_rels_idx      | index | mdione | planet_osm_rels    | 352 kB  |
 public | planet_osm_rels_parts    | index | mdione | planet_osm_rels    | 676 MB  |
 public | planet_osm_rels_pkey     | index | mdione | planet_osm_rels    | 38 MB   |
 public | planet_osm_roads_index   | index | mdione | planet_osm_roads   | 358 MB  |
 public | planet_osm_roads_pkey    | index | mdione | planet_osm_roads   | 78 MB   |
 public | planet_osm_ways_idx      | index | mdione | planet_osm_ways    | 2165 MB |
 public | planet_osm_ways_nodes    | index | mdione | planet_osm_ways    | 52 GB   |
 public | planet_osm_ways_pkey     | index | mdione | planet_osm_ways    | 6926 MB |
 public | spatial_ref_sys_pkey     | index | mdione | spatial_ref_sys    | 104 kB  |

3MiB more of points, 21+5+1MiB more of rels, 45+1MiB more of roads, 0+2+8MiB more of lines, 0+3MiB for polygons, 0+4+4MiB for ways. In total, some 97MiB more. I tried a VACUUM at the end, but no space was gained, and I don't have enough space for VACUUM FULL. As VACUUM does not defragment, a second and third updates should make use of the internal fragmentation. Let's see.

363.osc.gz is the smalest file I have, at ~22MiB. The times are internally different, but overall looks proportional:

$ osm2pgsql --append --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 --bbox -11.9531,34.6694,29.8828,58.8819 363.osc.gz
Maximum node in persistent node cache: 2701910015

Reading in file: 363.osc.gz
Processing: Node(750k 3.3k/s) Way(128k 0.44k/s) Relation(4264 15.73/s)  parse time: 792s

Node stats: total(750191), max(2703147051) in 230s
Way stats: total(128987), max(264655143) in 291s
Relation stats: total(4264), max(3556985) in 271s

Going over pending ways...
Maximum node in persistent node cache: 2703148031
        94490 ways are pending

Using 4 helper-processes
Process 0 finished processing 23623 ways in 238 sec
Process 2 finished processing 23622 ways in 238 sec
Process 1 finished processing 23623 ways in 238 sec
Process 3 finished processing 23622 ways in 239 sec
94490 Pending ways took 241s at a rate of 392.07/s

Going over pending relations...
Maximum node in persistent node cache: 2703148031
        8413 relations are pending

Using 4 helper-processes
Process 1 finished processing 2103 relations in 443 sec
Process 3 finished processing 2103 relations in 445 sec
Process 0 finished processing 2104 relations in 450 sec
Process 2 finished processing 2103 relations in 452 sec
8413 Pending relations took 453s at a rate of 18.57/s

node cache: stored: 576093(100.00%), storage efficiency: 60.50% (dense blocks: 437, sparse nodes: 252366), hit rate: 5.07%

Osm2pgsql took 1488s overall

The table sizes keep growing, as expected: OSM data does nothing but grow; my free space does nothing but shrink, currently at mere 249MiB. Given that the intermediate tables are dropped at the end of the second import phase, it only makes sense to do full imports from time to time, before updating the rendering rules. Munitely is not for me.


openstreetmap gis

Joaquin Tita: Detailed Interaction Design

The information that a website or application can exhibit to an user can be organised in different ways. A user navigates looking for information with a specific objective. The easier and faster he finds the information, the greater the satisfaction and productivity will be. For this reason is that we should facilitate tools that aid the user to achieve his goals. Navigation menus is one of the tools that facilitates the user to navigate inside the information architecture of website using dialog boxes, "boxes" (can be of any shape), images or even also simply text. Lets illustrate an hypothetical situation where a customer wants a to buy a product from a company A. This company has one of the best products in the market. Their website contains lot of information regarding the products offered and the company itself but the data is unstructured and it is presented into single page. On the other side of the river, the competence B, has a well organised site with a simple top menu for navigating the information concentrating the main services, products and company information. This menu helps the user to access directly the information with a single click. So, the customer visits A's website looking for a product  but disoriented with such a huge amount of information and unnatural navigation style; he finally desists. Next, the customer go to the competence's website and with just a couple of clicks using the top menu, he finds and buys the desired product. Most of the time, users are not patient and in fact, they lose it really easily. Currently, there are lot of websites that despite of having menus still have navigation problems. Moreover, having a good navigation design is not an easy task and should be tackled properly. 
Menu Tree - www.lushai.com
Three distinct goals must be obtained to have a pretty decent navigation.
  • Provide means to go from a place to another without overcomplicating the connections.
  • Communicate the relationship between the elements it contains.
  • Communicate the relationship between its contents and the page the user is currently viewing. 
There are different types of navigation designs with different characteristics.

Global Navigation

This type of navigation design provides access to the main areas or key points of a site. Every place where the user wants to go, eventually, will arrive there.
Global Navigation
Local Navigation
In this kind of navigation design, the user moves through the parent, siblings and children. Moreover, this style provides access to nearby elements in the architecture.
Local Navigation
Supplementary Navigation
This navigation design provides shortcuts to different parts of the architecture that are not easily reachable by global navigation or local navigation but at the same time maintaining a hierarchical structure.
Supplementary Navigation
Contextual or Inline Navigation
Sometimes while navigating content, the user need extra information. Instead of scanning through the content or outside the site, it adds hyperlinks embedded to the extra data. Not understanding the user needs and using this type of navigation design can lead to confusion.
Contextual Navigation
Courtesy Navigation 
It's useful to provide elements that are not needed on a regular basis but they are provided because of convenience. Typical examples are feedback forms, contact information and policies and principles.
Courtesy Navigation
Remove Navigation
In this design, the navigational device is not embedded in the structure and it's independent from the content or the functionality. For instance, site maps and index (also called "web site A-Z indexes") are clear examples of this type. When a user can't find what he wants using the other kind of navigational styles he lean toward to this type. The site map is a outline of the architecture of the site with links in a hierarchical order. Site maps usually provide two levels of depth at maximum. The most common way of generating index navigation is with a list of links of important elements alphabetically ordered.
Site Index

Screen Layout Diagram
These are all the elements that forms the interaction context and how they are placed into a window or page.  The layout defines the size, spacing, emphasis the GUI elements and location of the elements. Good layouts helps users to find what they are looking without forgetting the visual appearance. Nowadays is a highly searched characteristic in the market for products of any kind. Google's Play Store and Apple's App Store have basic guidelines that each app can follow concerning design, screens layout and also internal programming structures and details. Sometimes the process starts drawing the layout in paper with the elements placed to see how well they suit. There are also tools like inVision and balsamiq which facilitates designing layout prototypes with predefined drag-and-drop elements. 
Balsamiq App


Patterns
Although how we access the information is important also it is how the user consume it. Users unconsciously use different patterns while reading the content of a website depending if attracts or repulses the attention.

Reading Pattern
Users reads from the left to the right and from the top to the bottom following the layout or visual structure of the page. During the reading, most of the words are read but the content that seems unimportant or requires a lot of effort is skipped. This pattern is assimilated to the pattern used while reading a book  

F-Shaped Pattern For Reading Web Content
In a study conducted by Jakob Nielsen, he recorded how 232 users look at thousands of website pages. Their findings indicate that the dominant reading pattern looks like an F shape and has three components: 
  • At the top area of the sites, users read in an horizontal movement. 
  • After that, users read a little more in the content of the page going down and the start reading horizontally again.
  • Lastly, users scan what is left of content in a vertical movement.
Using heatmaps for tracking eye movements he distinguishes an F pattern.
F-shaped pattern

Arching Pattern
In this pattern, the user scans the page starting in the upper left corner and ends in the lower right. The upper right corner, strong fallow area, sometimes is being notice and the lower left corner is called weak fallow area. This way of scanning path is also known as Guttenberg Path. 
Guttenberg Path

Small Screen Pattern
In devices where the screen size is limited like smartphones and some tables, the scanning pattern is different. The path starts in the upper left corner and goes down until the end of the screen following a straight line. Once the device is turned right, the screen width is expanded and the pattern converts to something similar to the reading pattern.  

Conversation Pattern
"Monologue conversations" explain to the user everything.  They answer every question that he could have without involving the user. Sometimes it is a good approach but the user doesn't have a voice in deciding what to read or what to omit.

Sign Up with Long License 
Pyramid inverted style conversations display a summary with the important or essential information first and continues progressively disclosing details of content. This approach let the user decides when to stop reading because he finds sooner the useful information and if he think it is necessary he can continue reading for extra data.

Inverted Pyramid Style
Styleguides
They are documents that compile and explain all the information regarding a software product (or suite of products) user interfaces. In general companies include templates, design control and rules, logos, colours, typographies, illustrations and photographies. The main purpose is to have centralised storage for consistency developed in an iterative process and to communicate user experience standards across an organisation.
Some important concepts should be kept in mind while creating or updating a styleguide.
  • Keep the audience in mind - Different people inside a company such as developer, designers and business analysts can use these elements.
  • Plan for success - Think in advance what can make your styleguide successful in your organisation.
  • Keep it alive - Styleguides become soon outdate, for that reason they should be produced in way that they can be easily maintained and supported.
  • Define a review process - Define a specific process for reviewing and modifying the styleguide.
  • Think of the platform differences - The different platforms available in the market make us to choose if we support a specific platform or be neutral. If it is neutral in reference to the platforms, will be harder to maintain and bigger the spreading.
  • Socialize the document in your organisation - Promote the use of it throughout all the levels of the organisation. This ensures that everybody knows the existence, understanding and actively use of it. The more, the better.
  • Clearly define mandatory and flexible standards - Ensure supporting new platforms and new creative ideas setting mandatory standards that are also flexible. 
  • Make the styleguide as scannable and searchable as possible - Facilitate searching and browsing capabilities that will make easier to find what they are looking for. Visual examples are desired always if possible.
  • Provide real world example - Illustrate with examples from real applications to demonstrate your point. Try to cover as many applications if it is a large organisation with different applications.


Everything counts at the moment of enhance the user interaction, so better focus effort on it.
"Interaction design is about behaviour, how things work. [...]Defining what happens when a person uses a product or service is what interaction designers do.[...]The reason we do it is to enable connections interactions between people.[...]All of these things and many, many more are about connecting people and helping them communicate better between themselves and the world." (by Dan Saffer)

Marcos Dione: osm-planet-importing-and-rendering-times

For at least four months I've been trying to import the whole Europe in slim mode so it would allow updates. The computer is a Lenovo quad-core with 8GiB of RAM and initially 500GiB of disk. Last time I tried with the disk alone it took like 3 days to import just the data and more than a week passed before I got tired and canceled the index creation. That's the most expensive part of the import, and reading the data and writing the index on a seeking device is slow.

So I bought a 256GB SSD[1] and wanted to try again. I took 15GiB for the system and the rest to share between my files and postgres, but having the data files on the HDD. At first I tried importing the whole Europe, using 6GiB of cache; remember that my computer has 8GiB of RAM, I though it would fit. It actually didn't and was killed by the OOM killer. I had logs that showed osm2pgsql and the different postgres thread's memory usage, but somehow I lost them. If I find them I'll post them. I lowered the cache to 4GiB but it was still too big and the OOM killer was triggered again.

So I lowered the cache size to 2GiB, but then I was running out of disk space. I tried using osm2pgsql --bbox to import only from Iceland to somewhere between Κύπρος (Cyprus) and Κρήτη (Crete), so it includes Istambul and Sicilia, but is was still too big. So I started wondering about the sizes of OSM data. I ducked and googled around[3] for them to no avail, but then jburgess, the tile server sysadmin, answered me the question on the IRC channel[4] with these numbers:

 gis=# \d+
  Schema |        NAME        | TYPE  | OWNER |    SIZE    | Description
 --------+--------------------+-------+-------+------------+-------------
  public | geography_columns  | VIEW  | tile  | 0 bytes    |
  public | geometry_columns   | VIEW  | tile  | 0 bytes    |
  public | planet_osm_line    | TABLE | tile  | 44 GB      |
  public | planet_osm_nodes   | TABLE | tile  | 8192 bytes | *
  public | planet_osm_point   | TABLE | tile  | 4426 MB    |
  public | planet_osm_polygon | TABLE | tile  | 52 GB      |
  public | planet_osm_rels    | TABLE | tile  | 1546 MB    | *
  public | planet_osm_roads   | TABLE | tile  | 7035 MB    |
  public | planet_osm_ways    | TABLE | tile  | 59 GB      | *
  public | raster_columns     | VIEW  | tile  | 0 bytes    |
  public | raster_overviews   | VIEW  | tile  | 0 bytes    |
  public | spatial_ref_sys    | TABLE | tile  | 3216 kB    |

 gis=# \di+
  Schema |           NAME           | TYPE  | OWNER |       TABLE        |    SIZE    | Description
 --------+--------------------------+-------+-------+--------------------+------------+-------------
  public | ferry_idx                | INDEX | tile  | planet_osm_line    | 824 kB     |
  public | leisure_polygon_idx      | INDEX | tile  | planet_osm_polygon | 1437 MB    |
  public | national_park_idx        | INDEX | tile  | planet_osm_polygon | 1608 kB    |
  public | planet_osm_line_index    | INDEX | tile  | planet_osm_line    | 8937 MB    |
  public | planet_osm_line_pkey     | INDEX | tile  | planet_osm_line    | 2534 MB    |
  public | planet_osm_nodes_pkey    | INDEX | tile  | planet_osm_nodes   | 8192 bytes | *
  public | planet_osm_point_index   | INDEX | tile  | planet_osm_point   | 2565 MB    |
  public | planet_osm_point_pkey    | INDEX | tile  | planet_osm_point   | 1232 MB    |
  public | planet_osm_polygon_index | INDEX | tile  | planet_osm_polygon | 9295 MB    |
  public | planet_osm_polygon_pkey  | INDEX | tile  | planet_osm_polygon | 3473 MB    |
  public | planet_osm_rels_idx      | INDEX | tile  | planet_osm_rels    | 208 kB     | *
  public | planet_osm_rels_parts    | INDEX | tile  | planet_osm_rels    | 2837 MB    | *
  public | planet_osm_rels_pkey     | INDEX | tile  | planet_osm_rels    | 75 MB      | *
  public | planet_osm_roads_index   | INDEX | tile  | planet_osm_roads   | 1151 MB    |
  public | planet_osm_roads_pkey    | INDEX | tile  | planet_osm_roads   | 301 MB     |
  public | planet_osm_ways_idx      | INDEX | tile  | planet_osm_ways    | 2622 MB    | *
  public | planet_osm_ways_nodes    | INDEX | tile  | planet_osm_ways    | 112 GB     | *
  public | planet_osm_ways_pkey     | INDEX | tile  | planet_osm_ways    | 10 GB      | *
  public | spatial_ref_sys_pkey     | INDEX | tile  | spatial_ref_sys    | 144 kB     |
  public | water_areas_idx          | INDEX | tile  | planet_osm_polygon | 564 MB     |
  public | water_lines_idx          | INDEX | tile  | planet_osm_line    | 38 MB      |

[*] These are the intermediate tables and their indexes

So, around 167GiB of data and around 158GiB of indexes, of which 60GiB and 127GiB are intermediate, respectively. These intermediate tables and indexes are used later during the updates. Clearly I couldn't import the whole planet, but surely Europe should fit in ~210GiB? planet.pbf weights 24063MiB and europe.pbf scales at 12251MiB, so little bit more than 50%. It should fit, but somehow it doesn't.

Having no more free space, I decided to both create a new tablespace in the HDD and put the data tables there and the rest in the SSD, and to still reduce the north limit to the British islands, cutting out Iceland and a good part of Scandinavia. osm2pgsql supports the former with its --tablespace-main-data option. This is a summary of the successful import, with human readable times between brackets added by me:

$ mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --create --database gis --slim --cache 2048 --number-processes 4 --unlogged --tablespace-main-data hdd --bbox -11.9531,34.6694,29.8828,58.8819 europe-latest.osm.pbf
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: pgsql, scale=100 cache=2048

Reading in file: europe-latest.osm.pbf
Processing: Node(990001k 263.4k/s) Way(139244k 11.37k/s) Relation(1749200 217.43/s)  parse time: 24045s [~6h40]

Node stats: total(990001600), max(2700585940) in 3758s [~1h03]
Way stats: total(139244632), max(264372509) in 12242s [~3h24]
Relation stats: total(1749204), max(3552177) in 8045s [~2h14]

Going over pending ways...
        100666720 ways are pending

Using 4 helper-processes
100666720 Pending ways took 21396s [~5h57] at a rate of 4704.93/s

node cache: stored: 197941325(19.99%), storage efficiency: 73.74% (dense blocks: 132007, sparse nodes: 66630145), hit rate: 20.02%
Stopped table: planet_osm_nodes in 1s
Stopped table: planet_osm_rels in 44s
All indexes on  planet_osm_point created  in 4006s [~1h07]
All indexes on  planet_osm_roads created  in 5894s [~1h38]
All indexes on  planet_osm_line created  in 11834s [~3h17]
All indexes on  planet_osm_polygon created  in 14862s [~4h07]
Stopped table: planet_osm_ways in 26122s [~7h15]

Osm2pgsql took 72172s overall [~20h24]

So, ~20h24 of import time, of which ~6h40 is for the intermediate data, which went into the SSD, almost 6h importing the real data, which went into the HDD, and the rest indexing, which went again into the SSD. This is the final disk usage:

 Schema |        Name        | Type  | Owner  |  Size    | Description
--------+--------------------+-------+--------+----------+-------------
 public | geography_columns  | view  | mdione |  0 bytes |
 public | geometry_columns   | view  | mdione |  0 bytes |
 public | planet_osm_line    | table | mdione | 11264 MB | **
 public | planet_osm_nodes   | table | mdione | 43008 MB |
 public | planet_osm_point   | table | mdione |  2181 MB | **
 public | planet_osm_polygon | table | mdione | 23552 MB | **
 public | planet_osm_rels    | table | mdione |   871 MB |
 public | planet_osm_roads   | table | mdione |  2129 MB | **
 public | planet_osm_ways    | table | mdione | 43008 MB |
 public | raster_columns     | view  | mdione |  0 bytes |
 public | raster_overviews   | view  | mdione |  0 bytes |
 public | spatial_ref_sys    | table | mdione |     3 MB |
--------+--------------------+-------+--------+----------+--------------
total                                          126016 MB   39126 MB

 Schema |            Name             | Type  | Owner  |       Table        |  Size    | Description
--------+-----------------------------+-------+--------+--------------------+----------+-------------
 public | planet_osm_line_index       | index | mdione | planet_osm_line    |  4105 MB |
 public | planet_osm_line_pkey        | index | mdione | planet_osm_line    |   748 MB |
 public | planet_osm_nodes_pkey       | index | mdione | planet_osm_nodes   | 21504 MB |
 public | planet_osm_point_index      | index | mdione | planet_osm_point   |  1506 MB |
 public | planet_osm_point_pkey       | index | mdione | planet_osm_point   |   566 MB |
 public | planet_osm_point_population | index | mdione | planet_osm_point   |   566 MB |
 public | planet_osm_polygon_index    | index | mdione | planet_osm_polygon |  8074 MB |
 public | planet_osm_polygon_pkey     | index | mdione | planet_osm_polygon |  1953 MB |
 public | planet_osm_rels_idx         | index | mdione | planet_osm_rels    |    16 kB | *
 public | planet_osm_rels_parts       | index | mdione | planet_osm_rels    |   671 MB |
 public | planet_osm_rels_pkey        | index | mdione | planet_osm_rels    |    37 MB |
 public | planet_osm_roads_index      | index | mdione | planet_osm_roads   |   359 MB |
 public | planet_osm_roads_pkey       | index | mdione | planet_osm_roads   |    77 MB |
 public | planet_osm_ways_idx         | index | mdione | planet_osm_ways    |  2161 MB |
 public | planet_osm_ways_nodes       | index | mdione | planet_osm_ways    | 53248 MB |
 public | planet_osm_ways_pkey        | index | mdione | planet_osm_ways    |  6926 MB |
 public | spatial_ref_sys_pkey        | index | mdione | spatial_ref_sys    |   144 kB | *
--------+-----------------------------+-------+--------+--------------------+-----------+
total                                                                        102501 MB

[*] Too small, not counted
[**] In tablespace 'hdd', which is in the HDD.

That's a total of 228517MiB for this partial Europe import, of which 171434MiB are for the intermediate data. It's slightly more than I have to spare in the SSD, so I should cut still more data off if I wanted to import everything in the SSD. Then I tried to render with this, but it was awfully slow.

Luckily, when jburgess answered with the sizes, he also suggested to use flat nodes. This is an option for osm2pgsql which uses a special formatted file to store the intermediate data instead of postgres tables. According to the manpage, is faster for the import and the successive updates, and uses only about 16GiB of disk space, which is around a 10% of what my import used for the intermediate data but «[t]his mode is only recommended for full planet imports as it doesn't work well with small extracts.». I tried anyways.

So I used that option to create the flat node cache on the SSD and put all the data and indexes there too. Here's the summary:

mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --create --drop --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 --unlogged --bbox -11.9531,34.6694,29.8828,58.8819 europe-latest.osm.pbf
Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: pgsql, scale=100 cache=2048

Reading in file: europe-latest.osm.pbf
Processing: Node(990001k 914.1k/s) Way(139244k 17.64k/s) Relation(1749200 344.60/s)  parse time: 14052s [~3h54]

Node stats: total(990001600), max(2700585940) in 1083s [~0h18]
Way stats: total(139244632), max(264372509) in 7893s [~2h11]
Relation stats: total(1749204), max(3552177) in 5076s [~1h24]

Going over pending ways...
        100666720 ways are pending

Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
100666720 Pending ways took 29143s [~8h05] at a rate of 3454.23/s

node cache: stored: 197941325(19.99%), storage efficiency: 73.74% (dense blocks: 132007, sparse nodes: 66630145), hit rate: 18.98%
Stopped table: planet_osm_nodes in 0s
Stopped table: planet_osm_rels in 0s
All indexes on  planet_osm_roads created  in 1023s [~0h17]
All indexes on  planet_osm_point created  in 1974s [~0h33]
All indexes on  planet_osm_line created  in 4354s [~1h12]
All indexes on  planet_osm_polygon created  in 6777s [~1h52]
Stopped table: planet_osm_ways in 2s

Osm2pgsql took 50092s overall [~13h54]

So we went from 20h24 down to 13h54 for the whole operation, from 6h40 down to 3h54 for the intermediate data, from 5h57 up to 8h05 for the real data, and a lot less time for the indexing, like a third for each real data table and from 7h15 all the way down to 0 for the intermediate data. So even if the real data processing time went up more than 2h more, the whole import time is only ~68%, uses less space, and it fits in my SSD, with a lot of space to spare. For reference, the file nodes.cache uses only 20608MiB of disk space, which is ~12% of the space used by the intermediate postgres tables.

So, now, what about rendering time? This question is not easy to answer. I set up a very rough benchmark, which consists in rendering only one tile for each zoom level in a small town chosen without any particular criteria[6].

I used Tilemill to export my modified version of openstreetmap-carto to a Mapnik XML file, and used a modified generate_tiles.py to measure the rendering times. This is the resulting logarithmic graph:

Notice how the render time increases exponentially (it looks linear in the graph) between zoom levels 0 and 5, and then the big peaks (up to 720s!) for zoom levels 6 to 8. This is definitely worse than the render times I used to have when I imported several countries, but that data never got to the size of this import.

Of course, next weekend I'll fire a full render for the imported region between zoom levels 0 to 14, and then I'll have better numbers to share.

Meanwhile, when I tried to update the data, it failed:

mdione@diablo:~/src/projects/osm/data/osm$ osm2pgsql --append --database gis --slim --flat-nodes /home/mdione/src/projects/osm/nodes.cache --cache 2048 --number-processes 4 362.osc.gz
osm2pgsql SVN version 0.82.0 (64bit id space)

Node-cache: cache=2048MB, maxblocks=262145*8192, allocation method=11
Mid: loading persistent node cache from /home/mdione/src/projects/osm/nodes.cache
Maximum node in persistent node cache: 2701131775
Mid: pgsql, scale=100 cache=2048
Setting up table: planet_osm_nodes
PREPARE insert_node (int8, int4, int4, text[]) AS INSERT INTO planet_osm_nodes VALUES ($1,$2,$3,$4);
PREPARE get_node (int8) AS SELECT lat,lon,tags FROM planet_osm_nodes WHERE id = $1 LIMIT 1;
PREPARE delete_node (int8) AS DELETE FROM planet_osm_nodes WHERE id = $1;
 failed: ERROR:  relation "planet_osm_nodes" does not exist
LINE 1: ...rt_node (int8, int4, int4, text[]) AS INSERT INTO planet_osm...
                                                             ^
Error occurred, cleaning up

Somehow it's trying to use a table that was not created because the intermediate data is in the flat nodes file. I will have to investigate this; I'll try to do it this week.


[1] Actually, the disk is sold as '250GB', which is still using the same units as HDDs, so it means that it's only (and the kernel confirms this) 232 GiB[2].

[2] I can't come up for a reason is not 256 GiB, it seems more difficult to fabricate memory in not-power-of-2 sizes.

[3] I found that the verb for using duckduckgo for searching the Internet is «duck».

[4] I have no idea how many times I join #osm in Freenode just to end up asking OSM question in the #joomla channel.

[5] Actually postgres shows the numbers in «human readable sizes», which means that any size above 10240MiB was shown in GiB. I just multiplied those by 1024 to have a rough MiB value.

[6] Some day with more time I'll learn how to use the OSM plugin for ikiwiki.


openstreetmap gis