Gonzalo Martinez: Como configurar un RAID0 en AWS

La problemática empieza por mejorar la performance de IO en el server de Base de datos no es un problema actualmente pero tampoco queremos que lo sea

Primero que todo configuramos para nuestro renovado server una de las instancias con mejor performance de memoria según Amazon las instancias R3 [0]

Las instancias R3 están optimizadas para aplicaciones con un uso intenso de la memoria y ofrecen el coste más bajo por GiB de RAM entre los tipos de instancias de Amazon EC2.
Estos servidores tienen un Almacenamiento de Instancia [1] respaldado en Disco SSD pero estos son temporales por consiguiente si se Para y luego se Inicia nuevamente todo lo que habia en ese disco simplemente desaparece. Por eso decidimos seguir respaldando la DB en un EBS [2] y para mejorar un poco más la performance decidimos poner dos EBSs en RAID0

Entonces empecemos contando que es un EBS.  Amazon Dice:
Amazon Elastic Block Store (Amazon EBS) proporciona volúmenes de almacenamiento de nivel de bloque persistentes y diseñados para utilizarlos con las instancias de Amazon EC2 en la nube de AWS.
En resumen son digamos unos discos rigidos a pedido que van y vienen por la nube y que se pueden adjuntar a cualquier tipo de Instancia de EC2 y que son de almacenamiento permanente.

Ahora vamos con un poco sobre algo que siempre me costó entender ¿que es un RAID? [3]. Básicamente es un conjunto de discos independientes que se usan en conjunto generalmente para dar mayor redudancia a las implementaciones de Almacenamiento. Releyendo para escribir este Post me acuerdo cual es el motivo de que nunca lo terminara de entender y el motivo es que tiene muchos niveles y algunos resultan realmente complejos. Nivel se le llama a cada tipo de configuración que existente. Obviamente nosotros vamos a explicar algunas de las más caracteristicas.

El RAID0 (Data Striping) es la útilización de un conjunto de discos fisicos como si fueran uno solo. En verdad lo que se hace es distribuir equitativamente los datos en uno y otro disco por lo que esta configuración no le agrega redundancia al almacenamiento pero si nos ayudará en la mejora del rendimiento de lectura y escritura. En cuanto al tamaño este es limitado por el más pequeño de los discos si se tiene 2 discos de 100 GB el tamaño se duplicaría entonces pasariamos a tener un disco de 200GB ya que cada uno aporta 100GB a la distribución, pero si se tiene un disco de 300GB y uno de 100GB la distribución se hace sobre el más pequeño y cada uno aporta 100GB por lo que se obtiene un disco de 200GB perdiendo 200GB del disco más grande.


El RAID1 (Mirroring) crea una copia exacta de cada dato que va a un disco en uno o más. Esto es bueno para entornos donde es más importante la velocidad de lectura que la capacidad ya que (en su implementación más básica) se usarían dos discos de 100 GB cada uno y el máximo de almacenamiento es tanto como el más pequeño de los discos. Además tener un disco copiado exactamente le agrega redundancia al conjunto lo que es muy útil en ambientes de alta disponibilidad ya que si un disco falla el otro puede tomar su lugar sin demasiado problema.



Hay más? Si mucho más y es un tema largo y se puede volver bastante complejo. Leer en Wikipedia te va a dar una buena mirada [3].

Ahora vamos a lo nuestro como configurar un RAID0 para una instancia EC2 sobre Volúmenes EBS primero adjuntamos dos volumenes EBS a nuestra Instancia esto se puede hacer durante el lanzamiento de la Instancia en la Sección de "Add Storage" o posteriormente desde el Panel de Volúmenes de EC2 creando un Volumen y luego adjuntandolo a la Instancia. En este caso vamos a usar dos volúmenes de 30 GB.

Nosotros somos gente grosa así que usamos el tipo de servidores que sostiene al 95% de Internet servidores Linux entonces vamos a usar el comando "lsblk" que según man "lista los dispositivos de bloque"

vamos a  ver algo como esto

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 8G  0 disk
└─sda1   8:7    0  8G  0 part /
sdb      8:0    0 30G  0 disk
sdc      8:0    0 30G  0 disk

El disco sda de 8G es el que se monta como root por default en todas las intancias EC2 y el sdb y sdc son los que se van a usar para hacer el RAID0.

$ sudo mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=numero_de_volumenes dispositivo_1 dispositivo_2

Ejemplo:
$ sudo mdadm --create --verbose /dev/md0 --level=stripe --raid-devices=2 sdb sdc

Esto básicamente lo que hace es crear un nuevo dispositivo llamado md0 con los bloques sdb y sdc usando el nivel "stripe" que también podría ser "0" o "raid0"
-l, --level=
              Set  RAID  level.  When used with --create, options are: linear,
              raid0, 0, stripe, raid1, 1, mirror, raid4, 4, raid5,  5,  raid6,
              6, raid10, 10, multipath, mp, faulty, container.  Obviously some
              of these are synonymous.
Luego crearemos un sistema de archivos y un punto de montaje para montar el dispositivo.

$ sudo mkfs.ext4 /dev/md0
$ sudo mkdir /mnt/md0
$ sudo mount -t ext4 /dev/md0 /mnt/md0

Una vez montado ya podremos usarlo pero tendremos algunos problemas (no aparecerá el punto de montaje) si paramos y prendemos la instancia para eso se debe agregar la siguiente linea al archivo /etc/fstab. 

/dev/md0 /mnt/md0 ext4 defaults 0 0

Como dice el man de fstab 

The file fstab contains descriptive information about the various file systems. fstab is only read by programs, and not written; it is  the duty  of  the system administrator to properly create and maintain this file.

Y listo ya tenemos nuestro raid0 configurado en nuestra instancia ec2 o en verdad en cualquier linux.

Pueden probar como mejora el performance de IO con diferentes herramientas hdparm [4] o bonnie++ [5]

Más data de como hacer esta configuración y sus ventajas y desventajas en el siguiente link [6]

[0] http://aws.amazon.com/es/ec2/instance-types/
[1] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html
[2] http://aws.amazon.com/es/ebs/
[3] http://es.wikipedia.org/wiki/RAID
[4] http://es.wikipedia.org/wiki/Hdparm
[5] http://en.wikipedia.org/wiki/Bonnie++
[6] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html

Martín Gaitán: Los números del Mundial

Un amigo me dijo ayer que la fiebre del mundial ya no le afecta como antes. Cuando era chico, decía, la ansiedad por la navidad, los reyes y sobre todo, los mundiales, no lo dejaba dormir.

A mí Papá Noel y los Reyes me tenían bastante sin cuidado (que sólo ocasionalmente venían a mi casa, sobre todo cuando le hacíamos caso a mi hermano mayor, que nos instruía que a Baltazar había que dejarle una cerveza negra bien fría, para pasar el calor de las madrugadas de enero). Pero me sigue pasando lo mismo con el mundial de fútbol: fiebre. No hay acontecimiento que ansíe más que ese.

En el 86 yo ya caminaba solito, pero mis recuerdos mundialistas arrancan recién en el '90 con el gol de Camerún a Pumpido (que todavia tenía todos los dedos, aunque no se notaba). Tan poco futbolera era mi casa pero tanto yo, que tengo la imágen nítida: cuando fue el gol del Canni a Brasil yo estaba en la rotiseria de de Don Duca, a 3 cuadras de mi casa, esperando que me entregaran un pollo asado que me habían mandado a comprar. Al escuchar el griterío del barrio alguien me dejó pasar, y pude ver el gol en la repetición en un televisorcito diminuto que tenían sobre una mesada mugrienta. Volví llorando, emocionado, con un olor a pollo que no olvidaré más.

Y ahora, por fin, está llegando ese olorcito de nuevo. En el interín me empapo de nombres y jugadores, estadísticas e historias de color. La fiebre no me deja dormir.

En esa procrastinación encontré el artículo de Wikipedia con la lista de todas las selecciones y, como una ducha de agua fria, intenté sacarle alguna información.

En tu cabeza hay un (scrapper de) gol

Usé la misma técnica que en el análisis de las elecciones de Córdoba: la extensión de IPython para usar el orm de Django que hice y PyQuery para scrappear.

In [1]:
%load_ext django_orm_magic

Los modelos me quedaron así

In [2]:
%%django_orm

from django.db import models

class Country(models.Model):
    name = models.CharField(max_length="100")

    def __unicode__(self):
        return self.name

class City(models.Model):
    country = models.ForeignKey('Country')
    name = models.CharField(max_length="100")

    def __unicode__(self):
        return "{0}, {1}".format(self.name, self.country)

class Team(models.Model):
    country = models.ForeignKey('Country')
    group = models.CharField(max_length="100")

    def __unicode__(self):
        return unicode(self.country)

class Club(models.Model):
    name = models.CharField(max_length="100")
    country = models.ForeignKey('Country')

    def __unicode__(self):
        return self.name

class Player(models.Model):

    full_name = models.CharField(max_length=100)
    date_of_birth = models.DateField(null=True, blank=True)
    team = models.ForeignKey('Team')
    url = models.URLField(max_length=200, null=True, blank=True)

    place_of_birth = models.ForeignKey('City', null=True, blank=True)
    height = models.FloatField(null=True, blank=True)
    position = models.CharField(max_length=2)
    current_club = models.ForeignKey('Club')
    last_season_apps = models.IntegerField(null=True, blank=True)
    last_season_goals = models.IntegerField(null=True, blank=True)
    national_team_apps = models.IntegerField(null=True, blank=True)
    national_team_goals = models.IntegerField(null=True, blank=True)

    def __unicode__(self):
        return self.full_name

Lo que necesité es parsear cada una de las tablas asociadas a una selección. Por ejemplo, la de Brasil

In [3]:
from pyquery import PyQuery
from IPython.display import HTML, Image
pq = PyQuery('http://en.wikipedia.org/wiki/2014_FIFA_World_Cup_squads')
pq.make_links_absolute()
Out[3]:
[<html.client-nojs>]
In [4]:
brazil = pq('table:first').html()
HTML(brazil)
Out[4]:
No. Pos. Player DoB/Age Caps Club
1 1GK Jefferson (1983-01-02)2 January 1983 (aged 31) 9 Brazil Botafogo
2 2DF Dani Alves (1983-05-06)6 May 1983 (aged 31) 74 Spain Barcelona
3 2DF Thiago Silva (c) (1984-09-22)22 September 1984 (aged 29) 45 France Paris Saint-Germain
4 2DF David Luiz (1987-04-22)22 April 1987 (aged 27) 35 England Chelsea
5 3MF Fernandinho (1985-05-04)4 May 1985 (aged 29) 6 England Manchester City
6 2DF Marcelo (1988-05-12)12 May 1988 (aged 26) 30 Spain Real Madrid
7 4FW Hulk (1986-07-25)25 July 1986 (aged 27) 34 Russia Zenit Saint Petersburg
8 3MF Paulinho (1988-07-25)25 July 1988 (aged 25) 25 England Tottenham Hotspur
9 4FW Fred (1983-10-03)3 October 1983 (aged 30) 33 Brazil Fluminense
10 4FW Neymar (1992-02-05)5 February 1992 (aged 22) 48 Spain Barcelona
11 3MF Oscar (1991-09-09)9 September 1991 (aged 22) 30 England Chelsea
12 1GK Júlio César (1979-09-03)3 September 1979 (aged 34) 79 Canada Toronto
13 2DF Dante (1983-10-18)18 October 1983 (aged 30) 12 Germany Bayern Munich
14 2DF Maxwell (1981-08-27)27 August 1981 (aged 32) 8 France Paris Saint-Germain
15 2DF Henrique (1986-10-14)14 October 1986 (aged 27) 5 Italy Napoli
16 3MF Ramires (1987-03-24)24 March 1987 (aged 27) 42 England Chelsea
17 3MF Luiz Gustavo (1987-07-23)23 July 1987 (aged 26) 18 Germany VfL Wolfsburg
18 3MF Hernanes (1985-05-29)29 May 1985 (aged 29) 24 Italy Internazionale
19 3MF Willian (1988-08-09)9 August 1988 (aged 25) 6 England Chelsea
20 3MF Bernard (1992-09-08)8 September 1992 (aged 21) 10 Ukraine Shakhtar Donetsk
21 4FW (1987-03-20)20 March 1987 (aged 27) 16 Brazil Atlético Mineiro
22 1GK Victor (1983-01-21)21 January 1983 (aged 31) 6 Brazil Atlético Mineiro
23 2DF Maicon (1981-07-26)26 July 1981 (aged 32) 71 Italy Roma

Entonces hice esta funcioncita para guardar todos esos datos en mis modelos

In [5]:
def parse_squad(squad, group):
    country, _ = Country.objects.get_or_create(name=pq(squad).prev().prev().prev().text())
    print "Parsing", country
    team, _ = Team.objects.get_or_create(country=country, group=group)

    for row in pq('tr', squad)[2:]:

        position = pq('td:eq(1)', row).text()[-2:]
        full_name = pq('td:eq(2)', row).text()
        print full_name
        url = pq('td:eq(2) a', row).attr('href')

        club_country, _ = Country.objects.get_or_create(name=pq('td:eq(5) span.flagicon a', pq('tr', squad)[6]).attr('title'))
        club, _ = Club.objects.get_or_create(name=pq('td:eq(5)', pq('tr', squad)[6]).text(), country=club_country)
        print club
        Player.objects.create(full_name=full_name, url=url, position=position, team=team, current_club=club)


In [6]:
for i, squad in enumerate(pq('table:not(.sortable)')[:32]):
    parse_squad(squad, "ABCDEFGH"[i / 4])
Parsing Brazil
Parsing Cameroon
Parsing Croatia
Parsing Mexico
Parsing Australia
Parsing Chile
Parsing Netherlands
Parsing Spain
Parsing Colombia
Parsing Côte d'Ivoire
Parsing Greece
Parsing Japan
Parsing Costa Rica
Parsing England
Parsing Italy
Parsing Uruguay
Parsing Ecuador
Parsing France
Parsing Honduras
Parsing Switzerland
Parsing Argentina
Parsing Bosnia and Herzegovina
Parsing Iran
Parsing Nigeria
Parsing Germany
Parsing Ghana
Parsing Portugal
Parsing United States
Parsing Algeria
Parsing Belgium
Parsing Russia
Parsing South Korea
In [7]:
Player.objects.all()
Out[7]:
[<Player: Júlio César>, <Player: Jefferson>, <Player: Victor>, <Player: Daniel Alves>, <Player: Maicon>, <Player: Thiago Silva ( captain )>, <Player: David Luiz>, <Player: Marcelo>, <Player: Dante>, <Player: Maxwell>, <Player: Henrique>, <Player: Ramires>, <Player: Oscar>, <Player: Paulinho>, <Player: Hernanes>, <Player: Luiz Gustavo>, <Player: Bernard>, <Player: Fernandinho>, <Player: Willian>, <Player: Neymar>, '...(remaining elements truncated)...']

Por suerte Wikipedia reune a muchos enfermitos como yo, y cada jugador tiene su propio artículo con una ficha más o menos estandarizada que también es fácil obtener. Por ejemplo, la del mejor jugador del planeta:

In [8]:
messi = PyQuery(Player.objects.get(full_name__contains='Messi').url)
messi.make_links_absolute()
HTML(messi('table.infobox').html())
Out[8]:
Lionel Messi Lionel Messi Player of the Year 2011.jpg
Messi playing for Barcelona at the 2011 FIFA Club World Cup Personal information Full name Lionel Andrés Messi[1] Date of birth (1987-06-24) 24 June 1987 (age 26)[1] Place of birth Rosario, Argentina[1] Height 1.69 m (5 ft 7 in)[1] Playing position Forward Club information Current club Barcelona Number 10 Youth career 1995–2000 Newell's Old Boys 2000–2003 Barcelona Senior career* Years Team Apps (Gls) 2003–2004 Barcelona C 10 (5) 2004–2005 Barcelona B 22 (6) 2004– Barcelona 276 (243) National team 2004–2005 Argentina U20 18 (14) 2007–2008 Argentina U23 5 (2) 2005– Argentina 83 (37) * Senior club appearances and goals counted for the domestic league only and correct as of 01:38, 17 May 2014 (UTC).

† Appearances (Goals).

‡ National team caps and goals correct as of 11 September 2013
In [9]:
messi('table.infobox').text()
Out[9]:
u"Lionel Messi Messi playing for Barcelona at the 2011 FIFA Club World Cup Personal information Full name Lionel Andr\xe9s Messi [ 1 ] Date of birth ( 1987-06-24 ) 24 June 1987 (age\xa026) [ 1 ] Place of birth Rosario , Argentina [ 1 ] Height 1.69\xa0m (5\xa0ft 7\xa0in) [ 1 ] Playing position Forward Club information Current club Barcelona Number 10 Youth career 1995\u20132000 Newell's Old Boys 2000\u20132003 Barcelona Senior career* Years Team Apps \u2020 (Gls) \u2020 2003\u20132004 Barcelona C 10 (5) 2004\u20132005 Barcelona B 22 (6) 2004\u2013 Barcelona 276 (243) National team \u2021 2004\u20132005 Argentina U20 18 (14) 2007\u20132008 Argentina U23 5 (2) 2005\u2013 Argentina 83 (37) Honours Competitor for Argentina Men's Football Olympic Games Gold 2008 Beijing Olympic Team Copa Am\xe9rica Runner-up 2007 Venezuela Team FIFA U-20 World Cup Winner 2005 Netherlands U-20 Team U-20 South American Championship Third 2005 Colombia U-20 Team * Senior club appearances and goals counted for the domestic league only and correct as of 01:38, 17 May 2014 (UTC). \u2020 Appearances (Goals). \u2021 National team caps and goals correct as of 11 September 2013 Lionel Messi Born Lionel Andr\xe9s Messi ( 1987-06-24 ) 24 June 1987 (age\xa026) Rosario , Santa Fe , Argentina Residence Barcelona , Catalonia , Spain Nationality Argentinian Ethnicity Argentinian & Italian Occupation Association Footballer Salary \u20ac 16\xa0million Religion Roman Catholic Spouse(s) Antonella Roccuzzo Children Thiago Messi Parents Jorge Horacio Messi (father) Celia Maria Cuccittini (mother) Relatives Maxi Biancucchi (cousin) Emanuel Biancucchi (cousin) Website Official Website"
In [10]:
import re
fecha = re.findall(r'\d{4}\-\d{2}\-\d{2}', messi('table.infobox').text())[0]
fecha
Out[10]:
u'1987-06-24'

Luego de un poquito de experimentación con expresiones regulares (no hagan esto en su casa, amigos), llegué a otra funcioncita para extraer esas fichas y completar datos de los jugadores

In [11]:
from datetime import datetime

def fill_player(player):
    # print 'Retriving data for %s (%d)' % (player, player.id)
    pq = PyQuery(player.url)
    pq.make_links_absolute()
    info = pq('table.infobox').text()
    player.date_of_birth = datetime.strptime(re.findall(r'\d{4}\-\d{2}\-\d{2}', info)[0], "%Y-%m-%d").date()
    try:
        player.height = re.findall(r'(\d{1}\.\d{2})', info)[0]
    except IndexError:
        player.height = float(re.findall(r'Height (\d{3})', info)[0])/100

    try:
        player.last_season_apps, player.last_season_goals = re.findall(r'(\d+) \((\d+)\) National team', info)[0]
    except:
        pass
    try:
        player.national_team_apps, player.national_team_goals =  re.findall(r'(\d+) \((\d+)\)', info)[-1]
    except:
        pass
    player.save()
In [12]:
for player in Player.objects.exclude(url__contains='edit'):
    fill_player(player)

Y ahora sí, podemos escarbar algunos números

Cuestión de años

¿Quién es el jugador más viejo de Brasil 2014? ¿Quién el más jóven?

In [13]:
from datetime import date
viejo = Player.objects.filter(date_of_birth__isnull=False).order_by('date_of_birth')[0]
joven = Player.objects.filter(date_of_birth__isnull=False).order_by('-date_of_birth')[0]
In [14]:
today = date.today()
age = lambda player: (today - player.date_of_birth).days / 365.25
print((viejo.full_name, viejo.team, viejo.date_of_birth, age(viejo)))
print((joven.full_name, joven.team, joven.date_of_birth, age(joven)))
(u'Faryd Mondrag\xf3n', <Team: Colombia>, datetime.date(1971, 6, 21), 42.95687885010267)
(u'Fabrice Olinga', <Team: Cameroon>, datetime.date(1996, 5, 12), 18.064339493497606)

El veterano arquero colombiano Faryd Mondragón, con casi 43 pirulos, es el jugador más viejo del mundial. Por su parte, Olinga, delantero de Camerún, es el más jóven con 18 años recién cumplidos.

También podemos hacer un grafico de las edades de las selecciones. La edad promedio, y la desviación entre el más pibe y el más jovato de cada equipo

In [15]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
plt.xkcd()

def team_age(team):
    players = team.player_set.filter(date_of_birth__isnull=False)
    media = sum(map(age, players)) / players.count()
    younger = age(players.order_by('-date_of_birth')[0])
    older = age(players.order_by('date_of_birth')[0])
    return np.array([media, younger, older])


teams = map(unicode, Team.objects.all())
y_pos = np.arange(len(teams))
data = np.vstack(map(team_age, Team.objects.all())).T
plt.figure(figsize=(5,10))
plt.xlim([16, 45])
plt.barh(y_pos, data[0], xerr=[data[0] - data[1], data[2] - data[0]], align='center', alpha=0.3)
plt.yticks(y_pos, teams)
plt.show()

Otra para la prensa cholula: ¿Qué judadores cumplen años durante el mundial?

In [16]:
from django.db.models import Q

mundial = Q()
for i in xrange(12,31):
    mundial = mundial | Q(date_of_birth__day=i, date_of_birth__month=6)
for i in xrange(1,14):
    mundial = mundial | Q(date_of_birth__day=i, date_of_birth__month=7)

for p in Player.objects.filter(mundial):
    print "%s de %s cumple el %d/%d" % (p, p.team, p.date_of_birth.day, p.date_of_birth.month)
Aurélien Chedjou de Cameroon cumple el 20/6
Jean-Armel Kana-Biyik de Cameroon cumple el 3/7
Dejan Lovren de Croatia cumple el 5/7
Guillermo Ochoa de Mexico cumple el 13/7
Miguel Layún de Mexico cumple el 25/6
Eugene Galeković de Australia cumple el 12/6
Matthew Špiranović de Australia cumple el 27/6
Jason Davidson de Australia cumple el 29/6
James Troisi de Australia cumple el 3/7
Ben Halloran de Australia cumple el 14/6
Cristopher Toselli de Chile cumple el 15/6
Mauricio Isla de Chile cumple el 12/6
José Rojas de Chile cumple el 23/6
Jordy Clasie de Netherlands cumple el 27/6
Alberto Moreno de Spain cumple el 5/7
Faryd Mondragón de Colombia cumple el 21/6
Fredy Guarín de Colombia cumple el 30/6
James Rodríguez de Colombia cumple el 12/7
Sayouba Mandé de Côte d'Ivoire cumple el 15/6
Cheick Tioté de Côte d'Ivoire cumple el 21/6
Orestis Karnezis de Greece cumple el 11/7
José Holebas de Greece cumple el 27/6
Kostas Manolas de Greece cumple el 14/6
Kostas Katsouranis de Greece cumple el 21/6
Andreas Samaris de Greece cumple el 13/6
Shusaku Nishikawa de Japan cumple el 18/6
Keisuke Honda de Japan cumple el 13/6
Heiner Mora de Costa Rica cumple el 20/6
Joel Campbell de Costa Rica cumple el 26/6
Frank Lampard de England cumple el 20/6
Jordan Henderson de England cumple el 17/6
Luke Shaw de England cumple el 12/7
Alberto Aquilani de Italy cumple el 7/7
Antonio Cassano de Italy cumple el 12/7
Fernando Muslera de Uruguay cumple el 16/6
Juan Carlos Paredes de Ecuador cumple el 8/7
João Rojas de Ecuador cumple el 14/6
Gökhan Inler ( captain ) de Switzerland cumple el 27/6
Hugo Campagnaro de Argentina cumple el 27/6
Éver Banega de Argentina cumple el 29/6
José Ernesto Sosa de Argentina cumple el 19/6
Lionel Messi ( Captain ) de Argentina cumple el 24/6
Asmir Begović de Bosnia and Herzegovina cumple el 20/6
Sead Kolašinac de Bosnia and Herzegovina cumple el 20/6
Srđan Stanić de Bosnia and Herzegovina cumple el 6/7
Ghasem Haddadifar de Iran cumple el 12/7
Ashkan Dejagah de Iran cumple el 5/7
Harrison Afful de Ghana cumple el 24/6
Jonathan Mensah de Ghana cumple el 13/7
Nick Rimando de United States cumple el 17/6
Geoff Cameron de United States cumple el 11/7
DeAndre Yedlin de United States cumple el 9/7
Islam Slimani de Algeria cumple el 18/6
Koen Casteels de Belgium cumple el 25/6
Kevin De Bruyne de Belgium cumple el 28/6
Vasili Berezutski de Russia cumple el 20/6
Roman Shirokov de Russia cumple el 6/7
Alan Dzagoev de Russia cumple el 17/6
Kwak Tae-Hwi de South Korea cumple el 8/7
Hwang Seok-Ho de South Korea cumple el 27/6
Son Heung-Min de South Korea cumple el 8/7
Park Chu-Young de South Korea cumple el 10/7
Lee Chung-Yong de South Korea cumple el 2/7
Kim Jin-Su de South Korea cumple el 13/6
Lungos y petisos
In [17]:
alto = Player.objects.filter(height__isnull=False).order_by('-height')[0]
petiso = Player.objects.filter(height__isnull=False).order_by('height')[0]
print "%s de %s es el jugador más alto con %.2f m" % (alto, alto.team, alto.height)
print "%s de %s es el jugador más  con %.2f m" % (petiso, petiso.team, petiso.height)
Lacina Traoré de Côte d'Ivoire es el jugador más alto con 2.03 m
Edgar Salli de Cameroon es el jugador más  con 1.63 m
In [18]:
def team_height(team):
    players = team.player_set.filter(height__isnull=False)
    media = sum([p.height for p in players]) / players.count()
    petiso = players.order_by('height')[0].height
    alto = players.order_by('-height')[0].height
    return np.array([media, petiso, alto])

data = np.vstack(map(team_height, Team.objects.all())).T
plt.figure(figsize=(5,10))
plt.barh(y_pos, data[0], xerr=[data[0] - data[1], data[2] - data[0]], align='center', alpha=0.3)
plt.xlim([1.55, 2.1])
plt.title(u'Altura promedio, maximo y minimo de las selecciones')
plt.yticks(y_pos, teams)
plt.show()
Cuestión de experiencia
In [19]:
def team_apps(team):
    players = team.player_set.filter(national_team_apps__isnull=False)
    media = sum([p.national_team_apps for p in players]) / players.count()
    nuevito = players.order_by('national_team_apps')[0].national_team_apps
    experimentado = players.order_by('-national_team_apps')[0].national_team_apps
    return np.array([media, nuevito, experimentado])

data = np.vstack(map(team_apps, Team.objects.all())).T
plt.figure(figsize=(5,10))
plt.barh(y_pos, data[0], xerr=[data[0] - data[1], data[2] - data[0]], align='center', alpha=0.3)
plt.title(u'Partidos jugados con la seleccion')
plt.yticks(y_pos, teams)
plt.show()
La temible delantera
In [20]:
def team_ataque(team):
    players = team.player_set.filter(position='FW', national_team_goals__isnull=False)
    media = sum([p.national_team_goals for p in players]) / players.count()
    funes_mori = players.order_by('national_team_goals')[0]
    messi = players.order_by('-national_team_goals')[0]
    return (np.array([media, funes_mori.national_team_goals, messi.national_team_goals]), funes_mori, messi)

rows = []
for team in Team.objects.all():
    t, funes_mori, messi = team_ataque(team)
    rows.append(t)
    print "%s. Máximo goleador: %s (%d). Más perro: %s (%d)" % (team, messi, messi.national_team_goals, funes_mori,
                                                                funes_mori.national_team_goals)

data = np.vstack(rows).T
plt.figure(figsize=(5,10))
plt.barh(y_pos, data[0], xerr=[data[0] - data[1], data[2] - data[0]], align='center', alpha=0.3)
plt.title(u'Goles de delanteros en la selección')
plt.yticks(y_pos, teams)
plt.show()
Brazil. Máximo goleador: Neymar (30). Más perro: Jô (5)
Cameroon. Máximo goleador: Samuel Eto'o (55). Más perro: Fabrice Olinga (1)
Croatia. Máximo goleador: Eduardo (29). Más perro: Ante Rebić (1)
Mexico. Máximo goleador: Javier Hernández (35). Más perro: Raúl Jiménez (4)
Australia. Máximo goleador: Tim Cahill (31). Más perro: Mathew Leckie (1)
Chile. Máximo goleador: Alexis Sánchez (22). Más perro: Fabián Orellana (2)
Netherlands. Máximo goleador: Robin van Persie (42). Más perro: Jeremain Lens (7)
Spain. Máximo goleador: Fernando Torres (36). Más perro: Diego Costa (0)
Colombia. Máximo goleador: Radamel Falcao (20). Más perro: Víctor Ibarbo (1)
Côte d'Ivoire. Máximo goleador: Didier Drogba (63). Más perro: Giovanni Sio (0)
Greece. Máximo goleador: Theofanis Gekas (24). Más perro: Giorgos Samaras (8)
Japan. Máximo goleador: Shinji Okazaki (38). Más perro: Manabu Saitō (1)
Costa Rica. Máximo goleador: Álvaro Saborío (32). Más perro: Jairo Arrieta (4)
England. Máximo goleador: Wayne Rooney (38). Más perro: Daniel Sturridge (2)
Italy. Máximo goleador: Mario Balotelli (12). Más perro: Alessio Cerci (0)
Uruguay. Máximo goleador: Luis Suárez (38). Más perro: Christian Stuani (2)
Ecuador. Máximo goleador: Felipe Caicedo (15). Más perro: Armando Wila (0)
France. Máximo goleador: Karim Benzema (19). Más perro: Loïc Rémy (4)
Honduras. Máximo goleador: Carlo Costly (30). Más perro: Rony Martínez (1)
Switzerland. Máximo goleador: Mario Gavranović (4). Más perro: Haris Seferović (1)
Argentina. Máximo goleador: Lionel Messi ( Captain ) (37). Más perro: Rodrigo Palacio (2)
Bosnia and Herzegovina. Máximo goleador: Edin Džeko (33). Más perro: Edin Višća (0)
Iran. Máximo goleador: Reza Ghoochannejhad (9). Más perro: Reza Norouzi (0)
Nigeria. Máximo goleador: Victor Obinna (12). Más perro: Michel Babatunde (0)
Germany. Máximo goleador: Miroslav Klose (68). Más perro: Kevin Volland (0)
Ghana. Máximo goleador: Asamoah Gyan (39). Más perro: Jordan Ayew (2)
Portugal. Máximo goleador: Cristiano Ronaldo ( captain ) (49). Más perro: Éder (0)
United States. Máximo goleador: Clint Dempsey (36). Más perro: Aron Jóhannsson (2)
Algeria. Máximo goleador: El Arbi Hillel Soudani (10). Más perro: Islam Slimani (1)
Belgium. Máximo goleador: Romelu Lukaku (8). Más perro: Divock Origi (0)
Russia. Máximo goleador: Aleksandr Kerzhakov (25). Más perro: Maksim Kanunnikov (0)
South Korea. Máximo goleador: Park Chu-Young (24). Más perro: Kim Shin-Wook (3)

Si termino con otras cosas urgentes (la lista es larga), haré otras cuentas y grafiquitos. Por ejemplo, se puede ver cuales són los clubes y ligas con más jugadores mundialistas, los jugadores que juegan para un país distinto a aquel en el que nacieron (por ejemplo, Gabriel Paletta, ex jugador de Boca y de la selección sub 20 argentina, juega para Italia), y etcétera.

Algo más interesante sería cruzar datos con bases de datos de juegos como el FIFA o los que trackean información de transferencias, que permitiría no sólo saber cuan cara es una selección (para que mi padre se indigne justificadamente con el "sucio negocio del fútbol" y me mande a comprar pollo en pleno partido) sino en qué condiciones llega cada selección, qué características tienen sus delanteros (los jueguitos asignan coeficientes de habilidad, velocidad, puntería, etc.) y muchísimo más.

Mientras tanto, se pueden bajar la base de datos para usar estos modelitos que yo armé sin tener que correr los scrappers que pueden tardar un rato.

Salud, y que gane Argentina, pero sobre todo, que haya muchos momentos como este:

Joaquin Tita: Internationalization

Daily, millions of users consume a huge amount of information available in Internet. The explosion of mobile devices has done nothing but encourage the devouring attitude. Few years ago, the only method accessible was using a desktop computer. Nowadays, users in their way to work or back to home, access different to websites looking for information to satisfy their desires.

Companies have seen this opportunity and have turned to distribute content worldwide. One of the big issues is that users decline to be limited to a fixed category no matter if it is cultural, political or language. Customers demand that their ideas, culture and beliefs are respected. The fact is that these customers are no longer situated in a specific area or country; furthermore, they are located all over the globe. News, politics, shopping, games, are some of the main areas that an average users prefer. If you want to succeed and increase your the number of loyal clients, you have to think globally.

One popular way to reach more users is simply to internationalize your software products. By internationalize we mean, ensure that it will work well for, can be easily adapted for, users with different cultures, regions or languages. In the software world it’s commonly named as “i18n”.

Localization versus Internationalization

Sometimes, these two terms are often considered as synonyms across different websites but the fact is they aren’t.

Localization refers to the process of adapting a product, application or document content to meet the market needs from the view of culture, language and other related requirements (also called “l10n”).

Some customizations that are of concern:
  • Numeric.
  • Date and time formats.
  • Use of currency.
  • Keyboard usage.
  • Collation and sorting.
  • Symbols, icons and colors.
  • Text and graphics containing references to objects or actions (in distinct cultures may have different meanings).
  • Varying legal requirements

According to W3C, internationalization is the design and the development of a product, application or document content that enables easy localization for target audiences that vary in culture, region or language.
Typical activities that are considered:
  • Designing and developing in a way that removes barriers to localization or international deployment (using Unicode, handling correctly legacy characters, issues when concatenating strings, avoiding dependencies in code of the user interface, etc.).
  • Providing support for features that may not be used until localization occurs (supporting bidirectional text or identifying the language, adding specific CSS primitives for supporting vertical text or other non-Latin typographic features).
  • Enabling code to support local, regional, language, or culturally related preferences (include date and time formats, local calendars, number formats and numeral systems, sorting and presentation of lists, handling of personal names and forms of address).
  • Separating localizable elements from source or content, such that localized alternatives can be loaded or selected based on the user's international preferences as needed.

These items do not necessarily include localization of the content; they are good practices that ensures that if a migration needs to take place it can be done easily (without a catastrophic refactoring).
Internationalization is a fundamental step in the designing and developing process for delivering content globally. Anyway, it is much more expensive and time consuming readapting a linguistically and culturally deliverable than to include this process at first in the development process.
Some general internationalization guidelines follows to illustrate the power and scope of this process.
Text and Words Guidelines

  • Use simple sentences. Depending of your needs and target market the language might be different, but it is recommendable to use English if you think in something global. Furthermore, it is easier to translate from English to other languages. In addition, use a restricted set of vocabulary and follow Noun-Verb-Object sentence structure. Users don’t have all the same language proficiency. You want attract them, not disinterest or bore them.
  • Try to avoid acronyms, abbreviations and slang. All are difficult and sometimes confusing to translate.
  • Avoid using three noun words (stringing three nouns together).
  • Don’t use local or computer jargon (for example: AI for artificial intelligence, App for application, Bug for programming error, etc.).
  • Avoid culturally specific examples. In some countries they are not seen with good eyes.
  • All references to national, racial, religious and sexist stereotypes should be avoided. Even if they are jokes.
  • A telegraphic writing style is not recommended (terse style where words such as “and,” “the,” and “is,” are omitted).
  • Write in a formal or semi-formal style. An over-friendly style is not recommended as it can be considered condescending and irritating for the reader.
  • Adhere to local user language idioms and cultural contexts. There are some words that have different meaning in distinct countries.
  • Preserve the original word if it can’t be translated. For example: “Disk Drive” and “zooming” don’t exist in Thai.
  • Some words translated to other languages need more screen space. For instance, in Gmail “Inbox” is translated to Spanish as “Bandeja de Entrada”. Take care of extra space for this kind of issues (horizontally and vertically speaking).  The National Language Technical Center from IBM proposed a guideline for calculating the extra space required from the character number.
  • The main choices at the moment of selecting alternative languages should be: German in Europe, Arabic in Middle East, Japanese in Far East as suggested Microsoft. This company also applied this guideline in the following order due to the complexity of each one: Japanese, Arabic and German.
  • Place icon captions outside of the graphic to save redrawing the entire icon per translation. Avoiding this, the only thing needed is to translate the text to the desired language.
  • Adhere to local formats for date, time, money, measurements, addresses, and telephone numbers.

Image and Symbols

  • Adhere to local cultural and social norms. The meaning of the images may confuse the users. Objects like mailboxes, trashcans are different in some countries. Develop proper images for the cultures where they will be used.
  • Use internationally accepted symbols. The International Standard Association (ISO) has developed standard shapes for different purposes. Don’t reinvent the wheel.
  • Develop generic images. If a user uses many versions of the same product and images are diametrically different, he will be totally lost. Try to standardize the images in your product.
  • Be extremely careful with:
    • Religious symbols (crosses and stars).
    • The human body.
    • Women.
    • Hand gestures.
    • Flags.
    • Controversial geographic maps.
    • The cross and check for check boxes (the O mark used in Korea and Japan).
    • Review proposed graphical images early in the design cycle.

    Color, Sequence, and Functionality

    • Adhere to local color connotations and conventions. The association of colors is different in many countries. In Spain, mailboxes are yellow but in England and Argentina they are red or blue. The following chart some of the interpretations of colors in different countries:
    • Provide the proper information sequence. The information shown in the screen has a logical flow that in some countries may be different. Clear examples are cultures where they read from right to left like Arabic countries or from top to down like Japan or Korea.
    • Provide the proper functionality. Some functionalities provided in the product can be accepted in some countries but in others not. All the functionalities should be reviewed to ensure that they don’t generate any cultural trouble.
    • Remove all references to features not supported. All internationalization functionalities that are not available or not supported should be eliminated. Otherwise, they will generate noise and confuse the users.

    From any point of view, internationalization should be considered from the starting point. Modern web application frameworks such as Django Framework and Spring Framework provide internationalization features to help the developers to build powerful web applications without suffering a complete refactoring. Furthermore, the cost and time of starting from scratch rather than trying to internationalize the product once deployed is quite minimum. Even though a global scope is not aimed, things in the future might change. Expect for the unexpected...just in case.

    References

    • W3C Website - http://www.w3.org/International/
    • Designing Web Usability, Jakob Nielsen, Peachpit Press
    • The Essential Guide to User Interface Design: An Introduction to GUI Design Principles and Techniques, 3rd Edition,Wilbert O. Galitz, John Wiley & Sons

    Joaquin Tita: The Rabbit Problem

    Problem
    A certain man put a pair of rabbits in a place surrounded on all sides by a wall.
    How many pairs of rabbits can be produced from that pair in a year if it is supposed that every month
    each pair begets a new pair which from the second month on becomes productive?

    Leonardo da Pisa 

    Damián Avila: Zen themes updated

    OK, time to recap some things... As you know, Nikola 7.0.0 was released some weeks ago. It has a lot of improvements, bug fixes and new features. I recommend you to download and try it! As part of the release, we paid attention to update all the plugins and themes inside the Nikola Github organization (don't forget you can contribute with your own plugins and themes!). So, I updated my own themes, in particular, the Zen ones.

    Read more… (2 min remaining to read)

    Marcos Dione: local-markers-in-cookies-with-leaflet

    Last night I got an idea: start implementing some stuff on top of Elevation. The first thing I thought of was the ability to save markers in cookies, and to recover them every time the map was accessed. This would allow anybody to safely add personal markers on top of it without having to trust me (except for the map service). This might lead to something else entirely later, but for now that was the goal.

    So the first step was to find out how to access cookies in js. It seems that there is no native way to do it, and you have to parse them for yourself from the document.cookie attribute. Luckily someone already wrote some code for it. There was no license, but I think it's ok. Then I added another function to have a list of all the cookies whose name starts with some prefix, based on the readCookie() function:

    function readCookies(prefix) {
        var nameEQ = prefix + '_';
        var ca = document.cookie.split(';');
        var ans = [];
        j= 0;
        for (var i = 0; i < ca.length; i++) {
            var index = 0;
            var c = ca[i];
            while (c.charAt(index) == ' ') {
                index = index + 1;
            }
            if (c.indexOf(nameEQ) == index) {
                // keep reading, find the =
                while (c.charAt(index) != '=') {
                    index = index + 1;
                }
                ans[j] = c.substring(index + 1, c.length);
                j= j+1;
            }
        }
        return ans;
    }
    

    The next step was to encode and decode markers into strings. The format I decided is simple: CSV, lat,lon,text,URL. So, here's the function that converts cookies to markers:

    function markersFromCookies () {
        var cookies= readCookies ("marker");
    
        for (var i= 0; i<cookies.length; i++) {
            var cookie= cookies[i];
    
            var data= cookie.split (',');
            // it's lat,lon,text,url
            var marker= L.marker([data[0], data[1]]).addTo (map);
            // TODO: reconstruct the url in case it got split
            if (data[3].length>0) {
                marker.bindPopup ('<a href="'+data[3]+'">'+data[2]+'</a>').openPopup ();
            } else {
                marker.bindPopup (data[2]).openPopup ();
            }
        }
    }
    

    20 lines of code and there already is a TODO comment :) That's because URLs can have commas in them, but for the moment I'm thinking in short URLs from sites like bitly.

    All this was working perfectly in Firefox' scratchpad. Then I decided to put it in "production". For that I took all the js from the original page and put it in a file, along with all these functions, I removed the permanent marker from my map, converting it into a cookie, pushed the code, reloaded and... it broke.

    This is not the first time I see js fall apart. Last year I helped PyCon.ar's organization with the site, specifically the map showing the city with markers for the venue and the bus station. I even had to map the venue because it was not in OSM's data :) If you follow that link, you'll see that the popups are completely broken. These things were working perfectly in vacuum, but when I integrated it into the page it just fell apart. I never found out why.

    In my current situation, if I try to run markersFromCookies() in scratchpad, this is what I get:

    Exception: t.addLayer is not a function
    o.Marker<.addTo@http://cdn.leafletjs.com/leaflet-0.7.2/leaflet.js:7
    markersFromCookies@http://grulicueva.homenet.org/~mdione/Elevation/local.js:59
    @Scratchpad/1:1
    

    Basically, that's the call to L.marker().addTo(). Maybe the constructor is not working anymore, maybe something else entirely. At least this time a dim light in the back of my head told me maybe the map variable is not global as it seems to be from scratchpad, so I simply passed the map from setup_map() to markersFromCookies() and now it works. Notice that the error message never mentioned this fact but something else entirely. I'm just glad I didn't have to follow the hint and debug Leaflet's to find out this. All I hope now is that I don't go insane with this small project.

    Next steps: adding new markers and sharing!


    openstreetmap javascript elevation

    Joaquin Tita: Native, Web and Hybrid Apps

    The explosion of mobile applications has caught the attention of many companies for developing their own applications. The idea of having another way of interacting with the customers is very tempting. For example, banks have found that allowing basic banking operations using an application attracts customers that already use a mobile device. However, developing an application is not easy as it sounds. There are different approaches one can take depending on the requirements and the needs of the company. In general, there are three popular approaches: Native Applications, Web Applications and Hybrid Applications.

    What are Native Applications?
    They are applications developed for running only in only one platform. Currently, the platforms that capture more subscribers are iOS from Apple and Android from Google with 52.1% and 41.2% respectively. Far away from those numbers come other mobile platforms with less “followers” like Blackberry 10 from Blackberry Ltd and Firefox OS from Mozilla Foundation.

    In general, mobile applications are downloaded from common markets or stores, like Apple’s App Store or Google Play Store (before was called Android Market) and then installed on the device. The applications are offered in two ways: as free or as pay. This is one way in which developers and stores earn money. One part develops the application and the other distributes it by means of the store.  The other way is by mean of ads in the application and websites. This is how Google Play Store for example, get most of their profits.

    Revenues and Downloads
    In 2013, Apple reported that their customers spent over $10 billion on their App Store with almost three billion downloads in December. On the other side of the street, Google Play Store’s revenue share has been growing at a moderated but firm pace compared with Apple. Apple’s App Store is mostly fed by China and US and to a lesser extent by Vietnam and South Africa.

    In the battle of apps downloads, the current champion is Google Play Store with a difference near 45% over Apple’s App Store. This incredible volume of downloads is in part due to the emerging markets such as from Mexico and from Turkey where the adoption of Android devices is booming. Also, downloads were strong in countries like Brazil and Russia. Moreover, the already established markets (US and UK) are helping to shorten distance in the revenue field. The projections indicate that Google will continue narrowing the revenue gap with Apple and leading downloads in during 2014.


    What are the Pros and Cons of developing Native Applications?
    Pros:
    • Native applications use built-in features of the device and perform faster. Some of the most common features are the camera, the GPS, the accelerometer and the compass.
    • These applications get support from the app stores and marketplaces. Users can find and download what they want from the stores.
    • A native mobile app can produce the best user experience. High performance and fluid interaction can squeeze out all the built-in features and hardware in benefit of the user.
    • Stores have strict audit procedures for ensuring that new apps follow the security standards (most of the markets). Therefore, users can download applications safely.
    • Native applications suit better for developers because platforms provide SDKs and other tools for developing applications faster and easier.
      Cons:
      • A Native application approach can be expensive since they must be developed in each platform one by one. 
      • Programming skills are required for each platform. 
      • The cost of maintaining native applications is higher, especially if the application supports more than one mobile platform.
      • The process of approval for publishing an application on a marketplace can be long, as the requisites are high. Also, there is no guarantee that the application will be accepted. Even more, that the application will be top listed.
      • If there are different versions of the same app, it will be difficult to maintain and offer support.
      What are Web Applications?
      They are websites that may look and feel like native applications but under the hood are not the same. A web app is simply HTML5 (or simply HTML), JavaScript, CSS rendered by a browser. The web browser renders the web application like any other regular webpage. There is no need to download the application in order to access to it, only is needed the URL and that is all. 
      Some elements typically present in native application are emulated by the application such as swiping horizontally to change the section or similar design as seen in Android or iOS applications.

      What are the Pros and Cons of developing Web Applications?

      Pros:


      • They are easy to maintain as they share the same source code for multiple mobile platforms.
      • Web Apps can be compatible with almost any old device (with browsing capabilities).
      • These apps do not require passing through a rigorous approval process, just saving the URL.
      • There are no restrictions of time or design for releasing the application. The developers publish the application when is necessary.
      • All updates are transparent to the users as they only enter to the URL and execute always the last version available.

        Cons:

        • Web apps only can access a subset of the features available in the devices.
        • The cost of supporting multiple platforms is high, as well as the maintenance too.
        • The wide variety of devices makes difficult to offer support and study the usage patterns.
        • The visibility of the application is reduced, as it is not listed in an app store.
        • There are no standardized quality procedures for web apps, for this, quality and security of the app is not guaranteed.
        What are Hybrid Applications?
        These applications take a native wrapper and put inside a web application. That is, the native wrapper is a native application from the outside while the content is HTML, JavaScript and CSS. These kinds of applications are distributed by app stores, just like native applications. Although there are differences, users in general cannot distinguish between native applications and hybrid applications.

        Pros:
        • They are faster to develop since the development process is the same as follow by web development and small amount of native coding is necessary.
        • The tools for developing hybrid applications are evolving and maturing rapidly getting closer to native tools.
        • Hybrid Applications can be deployed in app stores.
        • They provide the best of both worlds, native apps + HTML5.
        • These applications can work without Internet connection.
          Cons:
          • They are not native apps despite of being packed by a native wrapper. The web engine is the one provided by the platform and so the performance is not the same as a native app.
          • No frequent updates (in the case of offline caching).
          Lastly, at the moment of deciding what type of application is needed for your business there are some aspects should be considered carefully.
          • Deciding how important is speed and performance.
          • Determining if Internet access is necessary.
          • Allocating the necessary budget for developing the application.
          • Choosing between using device’s built-in features or not.
          • Adopting an advertisement strategy (monetize the application).
          • Focusing on your target users (specific market/platform dependent).


                      Some sources:

                      Martín Gaitán: Scrapping de PDF con IPython y pdftotext

                      Pocos días después de mi análisis de datos del escrutinio provisorio en las últimas elecciones de Córdoba me llegó un correo que empezaba así:

                      Hola Martín, soy Franco Luque, profesor e investigador en Computación de la FaMAF. Con Jorge Sánchez, otro investigador de acá que trabaja en procesamiento de imágenes, vimos tu iniciativa, muy buena por cierto, y nos pareció muy interesante la posibilidad de procesar las imágenes de las actas para reconocer los números manuscritos. Tan interesante nos pareció que pensamos en la posibilidad de organizar una jornada de programación (lo que algunos llaman hackatón muy a nuestro pesar :P), posiblemente para sábado de la semana que viene.

                      Así fue que, en tiempo record, junto a Franco, Jorge, Jairo Trad, Andrés Vazquez y Marysol Farneda organizamos el evento Democracia con códigos en el que participaron 35 personas! Ese evento fue éxito en todo sentido y dio el puntapie inicial para armar el grupo Open Data Córdoba.

                      Abriendo datos para la democracia

                      Uno de los requisitos fundamentales para investigar datos es tenerlos. Si bien el sitio oficial datospublicos.gob.ar ya había publicado datasets oficiales de las elecciones, en el sitio resultados.gob.ar, donde se publicaron los telegramas en tiempo real, había más información.

                      En particular, hay una sección que muestra resúmemes los resultados provisorios por distrito que incluye un dato muy interesante: la hora en que fue computado cada centro de votación en el escrutinio provisorio. Lamentablemente, esa info atrapada en PDFs no es muy útil.

                      Si bien no alcazamos a utilizarlos en el evento (mi idea era agregar una línea de tiempo al mapa para ver cómo fue evolucionando), el dia anterior del hackatón dediqué un ratito a extraer esos datos para poder computarlos.

                      Lo publico ahora porque me parece útil no sólo como ejemplo de extracción de datos desde un PDF sino sobre las posibilidades de IPython Notebook (de paso, este artículo es un notebook) como entorno de "hackeo", pudiendo utilizar Python, muchísimos otros lenguajes y cualquier herramienta que tengamos en el sistema, de una manera integrada y fácil.

                      Primero, obtengo todos los PDFs para Córdoba.

                      In [1]:
                      for i in range(1, 27):
                          pdf = "C040%02d.pdf" % i
                          !!wget http://www.resultados.gob.ar/telegramas/telegramas/colegios/04/{pdf}
                      
                      In [2]:
                      !ls *.pdf
                      
                      C04001.pdf  C04004.pdf C04007.pdf  C04010.pdf  C04013.pdf  C04016.pdf  C04019.pdf  C04022.pdf  C04025.pdf
                      C04002.pdf  C04005.pdf      C04008.pdf  C04011.pdf  C04014.pdf  C04017.pdf  C04020.pdf  C04023.pdf  C04026.pdf
                      C04003.pdf  C04006.pdf      C04009.pdf  C04012.pdf  C04015.pdf  C04018.pdf  C04021.pdf  C04024.pdf
                      

                      Veamos cómo se ve una página

                      In [3]:
                      from IPython.display import Image
                      !convert -density 144 C04001.pdf[0] example.png
                      Image('example.png')
                      
                      Out[3]: