Marcos Dione: implementing-selenium-with-python-and-qt

I'm writing a python module that allows me to 'drive' a site using Qt. This means that I can navigate the site, fill forms, submit them and read the resulting pages and scrape them, Selenium style. The reasons I'm using Qt are that it has enough support for the site I'm driving (it's the web frontend of the SIP telephony solution we're using, which has an incomplete API and I have to automatize several aspects not covered by it); there are python bindings; and because I can do it headless: instead of using browser instances, I simply instanciate one QWebPage[1] per thread and that's it.

The first thing I learned today is that JS objects representing the DOM elements have two sets of value holders: attributes and properties. The properties is what in Python we call attributes: the object's elements which are accesible with the '.' operator and hold instance values. The attributes are in fact the HTML element's attributes that gave the properties' initial values. That is, given the following HTML element:

<input type="text" name="foo° id="bar" value="quux">

the initial JS object's attributes and properties will have those values. If you change the value with your browser, the value property of that element will be changed, but not the attribute. When you submit the form, the value properties of all the form elements are used, so if you "only' change the value attribute, that won't be used. So forget attributes. Also, the DOM is the representation of the actual state of the page, but this state is never reflected in the HTML source that you can ask your browser to show, but you see those changes reflected in the browser's debugger. It's like they really wanted[3] to keep initial values apart from current state[2].

On the Qt side, QWebElement is only the DOM element representation, not the JS object[4], so you can't access the properties via its API, but by executing JS[5]:

e = DOMRoot.findFisrt('[name="foo"]')
e.evaluateJavaScript("this.value = 'abracadabra'")

Tonight I finished fixing the most annoying bug I had with this site. To add a user I have to fill a form that is split in 7 'tabs' (which means 7 <div>s with fields where only one is shown at a time). One of the fields on the second tab has a complex JS interaction and I was cracking my skull trying to make it work. Because the JS is reacting to key presses, setting the value property was not triggering it. Next I tried firing a KeyboardEvent in JS, but I didn't succeed. Maybe it was me, maybe the fact that the engine behind QWebPage is the original Webkit and for some reason its JS support is lacking there, who knows.

But the good guys from #qtwebkit gave me a third option: just send plain QKeyEvents to the input element. Luckily we can do that, the web engine is completely built in Qt and supports its event system and more. I only had to give focus to the widget.

Again, I tried with JS and failed[7], so I went back cheating with Qt behind curtains. QWebElemnt.geometry() returns the QRect of the QWidget that implements the input element; I just took the .center() of it, and generated a pair of mouse button press/release events in that point. One further detail is that the .geometry() won't be right unless I force the second tab to be shown, forcing the field to be drawn. Still, for some reason getting a reference to the input field on page load (when I'm trying to figure out which fields are available, which in the long run does not make sense, as fields could easily be created or destroyed on demand with JS) does not return an object that will be updated after the widget is repositioned, so asking its geometry returns ((0, -1), (-1, 0)), which amounts to an invalid geometry. The solution is to just get the reference to the input field after forcing the div/tab to be shown.

Finally, I create a pair of key press/release events for each character of the string I wanted as value, and seasoned everything with a lot of QMainLoop.processEvents(). Another advantage of using the Qt stuff is that while I was testing I could plug a QWebView, sprinkle some time.sleep() of various lengths, and see how it behaved. Now I can simply remove that to be back to headlessness.

I'm not sure I'll publish the code; as you can see, it's quite hacky and it will require a lot of cleanup to be able to publish it without a brown paper bag in my head.

[1] Yes, I'm using qt5.5 because that's what I will have available in the production server.

[2] Although as I said, you can change the attributes and so you lose the original values.

[3] I guess the answer is in in the spec.

[4] I think i got it: QWebElement is the C++ class that is used in WebKit to represent the HTML tree, the real DOM, while somewhere deeper in there are the classes representing the JS objects which you just can't reach[6].

[5] This clearly shows that there is a connection between the DOM object and the JS one, you just can't access it via the API.

[6] This is the original footnote: Or something like that. Look, I'm an engineer and I usually want to know how things work, but since my first exposure to HTML, CSS and JS, back in the time when support was flaky and fragmented on purpose, I always wanted to stay as far away from them as possible. Things got much better, but as you can see the details are still somewhat obscure. I guess, I hope the answer is in the spec.

[7] With this I mean that I executed something and it didn't trigger the events it should, and there's no practical way to figure out why.

python pyqt

Facundo Batista: Regalo de fin de año: Recordium

En estas últimas semanas terminé de poner a punto un proyectito que había empezado durante el año. Aunque le faltan algunos detalles, ya es funcional y útil.

Se llama Recordium. Es una aplicación sencillita que ayuda al vos-fuera-de-tu-compu a recordarle cosas a tu futuro vos-en-la-compu.


La idea es que ejecutás Recordium en tu computadora, y se pone ahí como un iconito pequeñito.

Después, en cualquier momento, estando en la calle, cortando el pasto, en la cola de la panadería, etc, cuando te acordás de algo que tenés que hacer, le mandás un texto o audio de Telegram a tu Bot de Recordium.

Cuando volvés a tu computadora (donde tomás las acciones correspondientes sobre eso que te habías acordado), el iconito de Recordium va a estar iluminado, te va a decir que tenés un mensaje nuevo (o más), y ahí podés leer/escuchar lo que te habías acordado en otro momento.

¿Se podría hacer algo similar utilizando herramientas más complejas? Sí. ¿O algún servicio de Google? También, pero no quiero meterle más yo a Google. Igual, lo más importante de Recordium es que me sirvió de proyecto juguete para (al mismo tiempo que lograba una funcionalidad que yo quería) tener algo hecho en Python 3 y PyQt 5.

Marcos Dione: ayrton-0.9.1

Last night I realized the first point. Checking today I found the latter. Early, often, go!

  • ayrton-0.9 has debug on. It will leave lots of files laying around your file system.
  • Modify the release script to do not allow this never ever more.
  • make install was not running the tests.

Get it on github or pypi!

python ayrton

Marcos Dione: ayrton-0.9

Another release, but this time not (only) a bugfix one. After playing with bool semantics I converted the file tests from a _X format, which, let's face it, was not pretty, into the more usual -X format. This alone merits a change in the minor version number. Also, _in, _out and _err also accept a tuple (path, flags), so you can specify things like os.O_APPEND.

In other news, I had to drop support for Pyhton-3.3, because otherwise I would have to complexify the import system a lot.

But in the end, yes, this also is a bugfix release. Lost of fd leaks where plugged, so I suggest you to upgrade if you can. Just remember the s/_X/-X/ change. I found all the leaks thanks to unitest's warnings, even if sometimes they were a little misleading:

testRemoteCommandStdout (tests.test_remote.RealRemoteTests) ... ayrton/parser/pyparser/ <span class="createlink">ResourceWarning</span>: unclosed <socket.socket fd=5, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, raddr=/tmp/ssh-XZxnYoIQxZX9/agent.7248>
  self.stack[-1] = (dfa, next_state, node)

The file and line cited in the warning have nothing to do with the warning itself (it was not the one who raised it) or the leaked fd, so it took me a while to find were those leaks were coming from. I hope I have some time to find why this is so. The most frustrating thing was that unitest closes the leaking fd, which is nice, but in one of the test cases it was closing it seemingly before the test finished, and the test failed because the socket was closed:

ERROR: testLocalVarToRemoteToLocal (tests.test_remote.RealRemoteTests)
Traceback (most recent call last):
File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/", line 225, in wrapper
    test (self)
File "/home/mdione/src/projects/ayrton_clean/ayrton/tests/", line 235, in testLocalVarToRemoteToLocal
    self.runner.run_file ('ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay')
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 304, in run_file
    return self.run_script (script, file_name, argv, params)
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 323, in run_script
    return self.run_tree (tree, file_name, argv, params)
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 336, in run_tree
    return self.run_code (code, file_name, argv)
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 421, in run_code
    raise error
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 402, in run_code
    exec (code, self.globals, self.locals)
File "ayrton/tests/scripts/testLocalVarToRealRemoteToLocal.ay", line 6, in <module>
    with remote ('', _test=True):
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 362, in __enter__
    i, o, e= self.prepare_connections (backchannel_port, command)
File "/home/mdione/src/projects/ayrton_clean/ayrton/", line 270, in prepare_connections
    self.client.connect (self.hostname, *self.args, **self.kwargs)
File "/usr/lib/python3/dist-packages/paramiko/", line 338, in connect
File "/usr/lib/python3/dist-packages/paramiko/", line 493, in start_client
    raise e
File "/usr/lib/python3/dist-packages/paramiko/", line 1757, in run
    self.kex_engine.parse_next(ptype, m)
File "/usr/lib/python3/dist-packages/paramiko/", line 75, in parse_next
    return self._parse_kexdh_reply(m)
File "/usr/lib/python3/dist-packages/paramiko/", line 112, in _parse_kexdh_reply
File "/usr/lib/python3/dist-packages/paramiko/", line 2079, in _activate_outbound
File "/usr/lib/python3/dist-packages/paramiko/", line 1566, in _send_message
File "/usr/lib/python3/dist-packages/paramiko/", line 364, in send_message
File "/usr/lib/python3/dist-packages/paramiko/", line 314, in write_all
    raise EOFError()

This probably has something to do with the fact that the test (a functional test, really) is using threads and real sockets. Again, I'll try to investigate this.

All in all, the release is an interesting one. I'll keep adding small features and releasing, let's see how it goes. Meanwhile, here's the changelog:

  • The 'No Government' release.
  • Test functions are no longer called _X but -X, which is more scripting friendly.
  • Some if those tests had to be fixed.
  • Dropped support for py3.3 because the importer does not work there.
  • tox support, but not yet part of the stable test suite.
  • Lots and lots of more tests.
  • Lots of improvements in the remote() tests; in particular, make sure they don't hang waiting for someone who's not gonna come.
  • Ignore ssh remote() tests if there's not password/phrase-less connection.
  • Fixed several fd leaks.
  • _in, _out and _err also accept a tuple (path, flags), so you can specify things like os.O_APPEND. Mostly used internally.

Get it on github or pypi!

python ayrton

Facundo Batista: Incubadora de eventos

Una de los roles claves de la Asociación Civil de Python Argentina (en adelante "AC") debería ser que la gente se junte y comparta conocimiento. En consonancia con esto, estuve armando la siguiente idea para ayudar a que se generen eventos y reuniones.

Hay distintas formas en la cual la AC puede ayudar a los organizadores de un evento, entre ellas:

  • Ayuda logística: transmitir experiencia, ayudar a resolver inconvenientes que se produzcan en la operatoria del día a día
  • Ayuda financiera: como es normal que algunos sponsors se comprometan a aportar dinero, pero luego ese aporte se demora (por el sponsor en sí, o trámites de todo tipo, especialmente internacionales), la AC puede adelantarle dinero al organizador, el cual ingresará a la AC luego cuando el sponsor efectivice.
  • Ayuda económica: Partiendo de la idea base de que el evento salga hecho, o incluso genere dinero para la AC, hay dos puntos en que se puede ayudar económicamente: siendo sponsors de un evento muy chico (ej: pagando unas pizzas para un sprint), pero me parece más importante poder ser una red de contención, en el caso de que por fallos en la planificación conjunta se pierde algo de dinero: que la pérdida la cubra la AC y no la persona que organizó.
  • Ayuda institucional: Por un lado es útil tener una entidad legal para poder presentarse a más sponsors, o al estado, dando más seriedad al evento, y además es imprescindible que los sponsorships o donaciones para el evento se hagan a una entidad ya formada, y no a individuos; esto le saca un quilombo personal al organizador, y permite pagos internacionales.

Para poder ejecutar estas ayudas, los organizadores del evento y la AC tienen que colaborar, seguir ciertos pasos y reglas. ¿Quizás incluso firmar algún contrato?

Describo la metodología a grandes rasgos en los siguientes puntos. Pero esto hay que pensarlo, refinarlo, y escribirlo bien en detalle, para lograr dos cosas:

  • que el organizador u organizadores entienda bien como la AC va a jugar en esto
  • que la AC corra la menor cantidad de riesgos innecesarios posibles

Entonces, la idea es tener un presupuesto base, un template de lo que sería el presupuesto final del evento, con todo lo que podamos pensar y se nos pueda ocurrir de eventos anteriores. Se recorre el mismo con el organizador, en una primera instancia, y se elije lo que el organizador "quiere hacer"; luego el organizador mismo tiene que poner un estimado de costo a cada ítem, y separar los ítems en tres secciones (o quizás sólo dos para eventos chicos):

  • de mínima: sin esto el evento no sale
  • intermedio: con esto el evento está lindo
  • de máxima: si se logra esto es un golazo

En función de todo lo elegido, hay que planificar los sponsorships necesarios, en base a niveles: definirlos, incluyendo el costo y las retribuciones. Es imprescindible que la AC de "el visto bueno" sobre esta planificación, y que luego realice un "seguimiento en el tiempo" de la evolución de la ejecución del presupuesto. Acá la AC también puede jugar un rol centralizador, básicamente armando un folleto de "hay un nuevo evento, ¿querés ser sponsor?" y mandándolo a todos las empresas, instituciones, o lo que sea que tengamos en carpeta.

También en esta interacción AC/organizador se pueden pedir más cosas, o hacerlas obligatorias, ejemplo (¡hay que pensar más!):

  • que el evento tenga un código de conducta; incluso la AC puede proveerlo, junto con un pequeño texto de "qué hacer si se recibe una denuncia"
  • que el sitio web del evento sea "exportable a estático", así la AC lo guarda y sirve a futuro; también la AC podría dar un sitio web base, y hostearlo.

Por último, un detalle: estaría bueno que la AC también cumpla el rol de "paraguas legal" (básicamente, lo que arriba describo como "ayuda institucional") para ayudar a otros grupos relacionados con el software y/o cultura libre, para que puedan ellos hacer sus eventos.

Facundo Batista: PyCon Argentina 2016

El fin de semana pasado fue la octava edición de la conferencia nacional de Python en Argentina. Se realizó en Bahía Blanca, tres días de talleres y charlas.

Yo dí una charla, "Bindings, mutable default arguments, y otros quilom... detalles", y asistí a otras; las que más me gustaron fueron "Poniéndonos un poco más serios con Kivy" por Sofía Martin y alguien más que no recuerdo, "Compartiendo memoria eficientemente con proxies" por Claudio Freire, "Argentina en Python: comunidad, sueños, viajes y aprendizaje" por Humitos, "MicroPython en EDU-CIAA" por Martín Ribelotta, "Redes neuronales con Python utilizando Keras" por Fisa, "Deep learning: aprendiendo con la escafandra" por Javi Mansilla, e "Introducción a programación paralela con PyOpenCL" por Celia Cintas.

Mi charla, renovada

Las keynotes estuvieron muy bien, también. Fernando Schapachnik, de la Fundación Sadosky nos habló del problema de género en las comunidades informáticas (con datos, análisis, y una arenga política al final que estuvo bárbara). Ángel Medinilla nos dío una charla-show-standup sobre metodologías ágiles (excelente presentación). Y la última fue de Victoria Martínez de la Cruz, contando las ventajas y desventajas de trabajar de forma remota (algo que se está imponiendo más y más en las comunidades de software y que está lleno de mitos, así que era muy necesaria).

La organización del evento también estuvo impecable. Se nota que laburaron un montón y salió todo muy bien.

Los asistentes a punto de escuchar una plenaria

Más allá del costado técnico, y de lo que sucede en estos eventos de charlas que se generan, reencuentros, etc, tanto en pasillos como luego de la conferencia en bares o por ahí, quiero destacar el lado "humano"que tuvo esta conferencia.

No sólo las keynotes hablaron de las personas o sus grupos de trabajo, sino que también tuvimos charlas que hicieron lagrimear a varios, como la de Humitos que mencioné arriba o la de Roberto Alsina ("Cómo desarrollar software libre (o no) y no morir en el intento (o no)", que no pude ver pero me contaron). Pero había algo más en el ambiente. Gente comentando lo copada que son organizadores y asistentes en este evento, que cómo te ayudan con todo, que se preocupan, etc. Había muy buena onda por todos lados.

Relajando un poco, en el almuerzo del primer día

Trabajando en uno de los espacios abiertos que había

Hubo una anécdota interesante, también. Resulta que una señora vio en un kiosco a unos asistentes a la conferencia que tenían algo de Python encima. Entonces fue a la escuela de su hijo mayor, de 13 años, lo sacó antes de hora y volvieron a la zona del kiosco (que obviamente, era muy cerca del edificio de la conferencia). Justo pasábamos otros chicos y yo, vieron un pin de Python que llevo en la mochila, y nos preguntaron qué onda. Les contamos de la conferencia, Diego M. les regaló el librito del evento, y listo.

Nosotros pensábamos que terminaba ahí. Nada más lejos.

Al rato volvemos al edificio donde se desarrollaba el evento y vemos que sube a la zona de la conferencia la madre y los dos niños. El pibe de 13 se colgó todo el día yendo de charla en charla, mientras la mamá le hacía el aguante en una zona con sillones. No sólo eso, sino que fueron el sábado y el domingo a la conferencia, y se pasaron todo el finde allí. Notable.

Todas las manos todas

Para cerrar les dejo las fotos que saqué, más esta búsqueda de tuiter que está buena.

Marcos Dione: ayrton-

I'll keep this short. During the weekend I found a bug in ayrton. I fixed it in develop, and decided to make a release with it, because it was kind of a showstopper. It was the first time I decided to use ayrton for a oneliner. It was this one:

ayrton -c "rm(v=True, locate('.xvpics', _out=Capture))"

See, ayrton's native support for filenames with spaces makes it a perfect replacement for find and xargs and tools like that. That command simply finds all the files or directories called like .xvpics using locate and removes them. There is a little bit of magic where locate's output becomes rm's arguments, but probably not magic enough: _out=Capture has to be specified. We'll probably fix that in the near future.

So, enjoy the new release. It just fixes a couple of bugs, one of them directly related to this oneliner. Here's the changelog:

  • The 'Release From The Bus' release.
  • Bugfix release.
  • Argv should not be created with an empty list.
  • Missing dependencies.
  • Several typos.
  • Fix for _h().
  • Handle paramiko exceptions.
  • Calling ayrton -c <script> was failing because the file name properly was not properly (f|b)aked.
  • ayrton --version didn't work!

Get it on github or pypi!

Meanwhile, a little about its future. I have been working on ayrton on and off. Right now I'm gathering energy to modify pypy's Python parser so it supports py3.6's formatted string literals. With this I can later update ayrton's parser, which is based on pypy's. A part of it has been done, but then I run out of gas. I think FSLs are perfect for ayrton in its aim to replace shell script languages. In other news, there's a nasty remote() bug that I can't pin down. These two things might mean that there won't be a significant release for a while.

python ayrton

Marcos Dione: the-truth-about-bool-in-Python

I was trying to modify ayrton so we could really have sh[1]-style file tests. In sh, they're defined as unary operators in the -X form[2], where X is a letter. For instance, -f foo returns true (0 in sh-peak) if foo is some kind of file. In ayrton I defined them as functions you could use, but the names sucked a little. -f was called _f() and so on. Part of the reason is, I think, that both python-sh and ayrton already do some -/_ manipulations in executable names, and part because I thought that -True didn't make any sense.

A couple of days ago I came with the idea that I could symply call the function f() and (ab)use the fact that - is a unary operator. The only detail was to make sure that - didn't change the truthiness of bools. In fact, it doesn't, but this surprised me a little, although it shouldn't have:

In [1]: -True
Out[1]: -1

In [2]: -False
Out[2]: 0

In [3]: if -True: print ('yes!')

In [4]: if -False: print ('yes!')

You see, the bool type was introduced in Python-2.3 all the way back in 2003. Before that, the concept of true was represented by any 'true' object, and most of the time as the integer 1; false was mostly 0. In Python-2.2.1, True and False were added to the builtins, but only as other names for 1 and 0. According the that page and the PEP, bool is a subtype of int so you could still do arithmetic operations like True+1 (!!!), but I'm pretty sure deep down below the just wanted to be retro compatible.

I have to be honest, I don't like that, or the fact that applying - to bools convert them to ints, so I decided to subclass bool and implement __neg__() in such a way that it returns the original value. And that's when I got the real surprise:

In [5]: class FalseBool (bool):
   ...:     pass
TypeError: type 'bool' is not an acceptable base type

Probably you didn't know (I didn't), but Python has such a thing as a 'final class' flag. It can only be used while defining classes in a C extension. It's a strange flag, because most of the classes have to declare it just to be subclassable; it's not even part of the default flags. Even more surprising, is that there are a lot of classes that are not subclassable: around 124 in Python-3.6, and only 84 that are subclassable.

So there you go. You learn something new every day. If you're curious, here's the final implementation of FalseBool:

class FalseBool:
    def __init__ (self, value):
        if not isinstance (value, bool):
            raise ValueError

        self.value= value

    def __bool__ (self):
        return self.value

    def __neg__ (self):
        return self.value

This will go in ayrton's next release, which I hope will be soon. I'm also working in implementing all of the different styles of expansion found in bash. I even seem to have found some bugs in it.

python ayrton

[1] I'm talking about the shell, not to confuse with python-sh.

[2] Well, there are a couple of infix binary operands in the form -XY.

Marcos Dione: barely-working-centerline-JOSM-plugin

I just uploaded my first semi-automated change. This change was generated with my hack for generating centerlines for riverbank polygons. This time I expanded it to include a JOSM plugin which will take all the closed polygons from the selection and run the algorithm on them, creating new lines. It still needs some polishing, like making sure they're riverbanks and copying useful tags to the new line, and probably running a simplifying algo at some point. Also, even simple looking polygons might generate complex lines (in plural, and some of these lines could be spurious), so some manual editing might be required afterwards, specially connecting the new line to existing centerlines. Still, I think it's useful.

Like I mentioned last time, its setup is quite complex: The JOSM plugin calls a Python script that needs the Python module installed. That module, for lack of proper bindings for SFCGAL, depends on PostgreSQL+PostGIS (but we all have one, right? :-[ ), and connection strings are currently hardcoded. All that to say: it's quite hacky, not even alpha quality from the installation point of view.

Lastly, as imagico mentioned in the first post about this hack, the algorithms are not fast, and I already made my computer start thrashing the disk swapping like Hell because pg hit a big polygon and started using lots of RAM to calculate its centerline. At least this time I can see how complex the polygons are before handing them to the code. As an initial benchmark, the original data for that changeset (I later simplified it with JOSM's tool) took 0.063927s in pg+gis and 0.004737s in the Python code. More test will come later.

Okey, one last thing: Java is hard for a Pythonista. At some point it took me 2h40 to write 60 lines of code, ~2m40 per line!

openstreetmap gis python