vendredi 21 octobre 2011

Les vieux

Les gens qui viennent du futur pour prévenir d'un danger et qu'on ne croît jamais, existent bel et bien: on les appelle "les vieux".

jeudi 30 décembre 2010

Taxe contre la copie privée

A la base, le concept de taxe sur la copie privée est pour le moins... original, puisqu'il vise à dédommager les ayant-droits du préjudice des copies illégales. Je suppose qu'on a su trouver une motivation plus "présentable", comme par exemple la brillante idée du "défaut de protection de la ligne ADSL" pour HADOPI.

Mieux encore, une récente extension de la taxe a une bien curieuse formulation qui semble favoriser Microsoft au détriment des autres éditeurs de systèmes d'exploitation, rapportée par UFC-Que choisir.

Une taxe proportionnelle à la capacité de stockage parait à la limite logique (une fois qu'on a accepté l'idée de payer une taxe sur la sauvegarde de ces photos de vacances sur DVD), mais la distinction selon la nature du système d'exploitation est excessivement surprenante.

Pour avoir travaillé un peu dans le domaine, je peux dire que la distinction entre système d'exploitation "pour mobile" et système d'exploitation "pour bureau" est de plus en plus tenue.

En ce qui concerne Linux, la principale adaptation à effectuer pour faire tourner un Linux sur un appareil "mobile" concerne essentiellement le processeur et la gestion de périphériques que l'on y trouve spécifiquement (écran LCD relativement petit, dalle tactile, ...). L'autre différence est que l'on va choisir une distribution "légère" plutôt qu'un Ubuntu standard (encore que je l'ai vu tourner sur une carte d'évaluation... mais lentement).

Quels critères permettent de décider qsi le système d'exploitation répond ou non aux désignations de "propre", "mobile" ou "standard"? Cela ne peut qu'engendrer des litiges. Reste à espérer qu'ils seront réglés de manière plus transparente que leur cause.


mercredi 11 août 2010

Unit tests as a language tutorial

Writing a tutorial for a complex piece of software is generally difficult and a bit boring. Especially if you're writing a tutorial on  a programming language intended to non-technical users, and when in the same time you are still tweaking the language itself.

You have to slowly introduce features, preferably one at a time, and give many examples and code samples. One particular boring aspect is that you have to check that these examples actually work. This is especially annoying when you modify the language itself at the same time.

This happens more than often, because writing documentation in general leads to question the design of the thing being documented. It often permits to realise that you have taken a wrong path on a particular feature, or that something is not convenient enough, and that you could improve it.

So when you document a programming language that is not yet carved in stone, and wish for it to be as good as possible, you do the improvements. You enter then in a document/question/improve loop. This is not bad, but the time you spent in writing some parts of the documentation is lost.

On the other hand, I have a test file for Lama. It gathers a number of unit tests, sometimes very basic like checks that precedency rules between * and + are honoured. As I spend my time in a rather anarchic way between documentation, language tweaks and writing libraries it somewhat lagged behind lately. When reviewing it, I obseved that the progression in the tests is quite similar to the progression in the tutorials: basic stuff first, then the more complex features and combinations. This is of course not an accident: the first tests where written first, when Lama was less capable than a desk calculator; but also because if a regression makes, say, the string concatenation fail, it is better for me that the string concatenation test fails rather than, say, the test of closures.

This observation of the quasi parallel progression between the tutorial and the unit tests made me realise that the unit tests file, with the appropriate comments, could be the tutorial. This is as a matter of facts not a new idea; some claim that Test Driven Development is not about making bug-free software, but about driving the design of programs and documenting them. Indeed, unit tests by definition show what to do to get a particular result, or what result to expect from a particular sequence of instructions.

Another obervation is that a significant part of the text of programming language tutorials one may find on the the internet is actually code. The Scribble documentation generator for PLT Scheme/Racket even features a mean to execute the sample code and include its result in the generated documentation. 

In some situations, you may have a document that is one-half plain text and one-half code, or is one-half plain text and one-half dynamically generated text (like web pages for instance). One then has to make a choice: either embedded the code in the text, or embedded the text in the code. All tutorials on programming languages I have seen use the latter form (code embedded in text). I'll innovate a bit here by choosing the other way.


mercredi 14 juillet 2010

Lama tutorial blueprint: preambule

Computers are amazing machines. Thanks to Internet and to free software, it is quite easy to find software that fits our needs -- for free.

However, the more the task to do is particular, the harder it is to find a ready-made piece of software. Often, as a matter of fact the task is a combination of simpler ones, for which programs do exist. But how to combine them?

This problem, probably as old as computers, has different kinds of solutions. The first one consist in including some communication or cooperation means between applications. Those who promote the "Unix philosophy" are in favor of that; they believe that each program should only do one thing, and that we should build applications by linking them together with pipes, in which flows textual data. This approach is in use in Linux for moderately complex programs. Because in practice, the more the application is complex, the even more complex is the required "plumbing".

The second kind of solution is to create "big" but extensible software; most of the times by the mean of an embedded programming language. A popular exemple of this are web browsers like Firefox; one may also think of office suites. The drawback is that this extensibility is often limited to the domain of the application.

The last kind of solution consists in programming by ourselves what we need. We build our application by assembling software components with a scripting (or "glue") language.

Note that all three require some amount of programming.

I believe that the last approach is the best one, because it doesn't suffer the limitations or constrains of the others: using a true programming language allows to manipulate data in their original form rather than in a textual form, which makes things easiers. And by using software components, we are able to freely combine functions that belong to completely differents domains.

There are already plenty of languages for this, like Lua or Python. However, none of them gather the set of features that I think are required in order to allow those who feel able to, to program by themselves what they want.

These features are:

  • Early checks to flag the most obvious programming mistakes. The corner stone for this is static type checking. Most of the scripting languages check very few things; as a consequence a stupid typo can lead to a hard to find bug.
  • To accept that programming is a mean, not an end. Most languages are designed with the assumption that either you want to become a "true" programmer, or that you already are. Few assume that you program on the week-end just like others throw shelves together.
  • Correctness. Scripting languages are often designed with the idea that scripts are little, personnal throw-away programs, that don't need to be reliable or entirely correct. From my perspective, and in my experience, a program that deserved to be written is a program that might be reused and enhanced. To write a correct, reliable and robust program from the beginning is most of the time a benificial investment.
  • Efficiency. Scripting languages, by nature, are slower and more memory hungry than others; but some are abnormally. A greater efficiency allows to delay the moment when the programmer has to rethink its program because it is way too slow.

This is why I had to create Lama.

Although Lama is design to be approachable, I do not promise you that things will always be simple, smooth and easy. The picture of the typical Lama user I have in mind, is someone with some scientific education, or at least who has a logical mind. A prior programming experience is welcome, but not required.

Because programming requires that (this, and also some patience and self-control); those who promise you Easiness incarnated are always decieving you in a way or another.

This means that you will have to invest some efforts to read, experiment and understand, for instance, the difference between an entity and the reference to an entity; the concept of a type or of a closure, may also give you a hard time.


dimanche 11 juillet 2010

Better than GCC?

This one is of the sadly-funny kind.

Lama, my little programming language project, tries to spot the usual programming mistakes: using a variable before it has been initialized, returning an undefined value, etc. I am kind of proud of that, because it is quite reliable and to my surprise, already pointed to me such errors a couple of times.

The last time is precisely today. I was implementing for the first time an extension library, for accessing files. On the lama side, I wrote a quick and dirty test script that goes like this:


var testFile File
when openFile "todo.woim"
is File: set testFile, it
is BadFilename: print "bad filename"
is PermError: print "permission error"
is NoDevError: print "nodev"
is ResExhausted: print "sys error"
is Bug: print "bug"
end

when read testFile 100
is String: print it
is Nihil: print "EOF"
is FileClosedError: print "closed file"
is IOError: print "IO error"
end
Lama pointed to me that at line 'when read testFile 100', the variable 'testFile' is used before initialization, which is correct because it was not initialized in all the cases of the previous 'when' statement. 'when' statements are like 'switch' statements or pattern matching of FP languages, except that they deal with the members of a tagged union, which have distinct types (most of them are actually unit types, here).
Lama is unusually strict about that, given that it is a scripting language. This prevents precisely the "quick and dirty" temptation.

I put the latter 'when' in the first 'is' of the former one, but Lama was still crashing. When I looked back at the C side the function that was actually the first mistake I saw was.... that I was using an uninitialized variable!

And GCC didn't warn me about that, despite of the -Wall -Wextra and whatnot.

Lama is a very simple language. and it's implementation uses brute force at will; it doesn't even built abstract syntax trees nor does SSA transforms, the kind of thing respectable compilers do.

And still, it does a better job than GCC at spotting obvious programming mistakes. It is funny to think that my little toy, home-made 4KLoc compiler does better than a multi- megabyte one implemented by experts.

Edit:
It appears that one gets more warnings when one switches from -g to -O2; how is that a debugging switch does mask potential bugs?


vendredi 19 mars 2010

When can code duplication be better?

Follow-up to a conversation on reddit:

C and C++ make it hard to read a register for its side effects (main article)

my comment:

"It seems to me that when one is concerned about these issues, and checks the compiler's output, one is also able to write the desired machine code.
So why not just use inline assembly in the first place?"

A redittor:

"It doesn't hurt to write portable code even in this context. You may have to run your code on a PPC instead of an ARM, but targeting the same peripheral."

It happens that I worked a lot this week on the Linux kernel; I tried to add support for a development board we bought, and it happens it is ARM-based.

The thing to know is that ARM is not just one processor, it is a large family of processors, perhaps as big as the m68k familly. As a result, there's a large collection of processor-specific source code files in the kernel.

There could be more, actually. It's kind of a tautology, but within a family the difference between two processors are minor with regard to what they have in common. Often it boils down to some extra built-in peripheral support.

As a consequence, and it looks like a good thing at first glance, the Linux maintainers choose to put the sharable code into common files, and use various tricks for the parts that differ.

Except that it is squared by the number of boards that may use a chip, because different boards are often "wired" differently.

In the particular, in the kernel part I was working in, they handled that not only with the usual #ifdef conditional compilation tricks, but also with include file search path order.

Conditional compilation is brittle, because a single typo can accidently select the wrong code; it is also harder to read. The include search path thing is even worth; one has to read an horribly long compilation command in order to get the actual inclusion order. When you see that some compilation options (including the -I ones, include search path specifications) are duplicated, probably because the makefile stuff is not totally under control, you start to think that something is wrong.

The bottom line is that, in order to avoid duplicate code, we end-up with a build recipe that can give one headaches very quickly. 

It seems to me that the total energy spent in the DRY effort overweights the cost of just duplicating the code. This rather particular and extreme case, but it shows that the DRY principle isn't always appropriate.


samedi 13 mars 2010

13 Mars

A peine croyable. La CIA aurait "épicé" le pain d'un petit village du sud de la france en 1951? (en anglais)

Internet ou Minitel 2.0? 

Plus:
http://faq.tuxfamily.org/InternetLibre_ou_Minitel_2.0/Fr

http://wiki.auto-hebergement.fr/

"Vector conflict Siege" - Jeu flash

Slayer - Raining Blood - version 8bits

Steambirds - jeu flash; mini-wargame tactique au tour par tour avec une pointe de temps réel.

Retroshoot360 - Jeu flash. Shoot them up style arène. Difficile, mais prenant.