Tuesday, October 20, 2009

Member access syntax: dots, brackets, and functions

I've spent some time thinking about common alternative syntaxes for accessing object members. By members, I mean methods or fields or whatever. For example, we have the struct dot syntax:

person.age
array.length


And so on. C++ extended C to use the same syntax for method calls:

person.age() or person.getAge()
array.length()
pen.moveTo(x, y)


Languages like Eiffel, Ruby, or Fan that treat all method calls as abstract messages don't distinguish between field access and method calls.

Anyway, then we have array or hashtable access syntax, a different kind of membership:

persons[4]
properties["background"]


And languages like JavaScript unify the two; properties.background and properties["background"] mean the same thing. There's some nicety to that.

But I'm working through a series of issues that will eventually get tangled. First, I need to go back to C. I can abstract member access there, too:

age(person)
length(array)
moveTo(pen, x, y)


I'll pretend to have those namespaced for the moment (since otherwise in C they'd have uglier names), and if I have dynamic dispatch on the first (or more) arguments, this syntax could have the same meaning as virtual methods in C++ or Java or whatnot. For example, MATLAB can do OOP method dispatch with standard function call syntax.

So the syntax question becomes do you like nested function calls or postfix member dot access? I think the postfix (or somewhat infix) C++ style is easier to understand:

toUpper(firstName(person)) vs.
person.firstName.toUpper


The chain is just easier to read. And apparently the C# folks thought so enough that they introduced extension methods. A semi-complicated way of allowing dot chain syntax. Why not just make a language such that both syntaxes are equivalent?

One reservation. Here's that tangle to which I was referring. Lets look at arrays again. Why do we really need a separate array access syntax vs. function call syntax? An array is just a function that hardcodes the responses, in one way of looking at it. So, like Scala or MATLAB, we could make them the same:

persons(4) vs.
persons[4]


And that frees up brackets for other uses (like generics in Scala). But, if like JavaScript, we consider struct membership and array-ish access to be the same, then 'persons[4]' is the same as 'persons.4', and that would be the same as '4(persons)', and all the worse if it's the same as 'persons(4)'.

So, it tells member that array/hashtable lookup should best be separate syntax from function calls. I like it ('persons[4]') as shorthand for notions like 'persons.get(4)'. I think otherwise the unification of concepts leads to entanglement.

At least, I haven't worked out another solution I like better.

5 comments:

  1. Why not person.4? Identifiers typically never start with a number, so that's reasonably easy to parse...

    --
    Cedric

    ReplyDelete
  2. Cedric, thanks for the comment. I did have that mentioned in passing in a paragraph as part of my syntax equivalence dilemma. In the sense that I'm claiming 'persons.4' is equivalent to '4(persons)', and that just doesn't seem right to me.

    (One of my troubles here is that I threw this post together a bit too fast. On limited time, sometimes I just make sure to post before I forget a thought that I might want to remember in the future. But it means I might have a less than ideal presentation of the subject.)

    ReplyDelete
  3. Or in other words, 'persons.4' makes sense but its equivalent '4(persons)' doesn't. Also, 'persons(4)' makes sense but its equivelant '4.persons' doesn't.

    However, 'persons.get(4)' and its equivalent 'get(persons, 4)' both make sense. Given that, 'persons[4]' as shorthand for 'persons.get(4)' also seems reasonable.

    Maybe another view of the theme is that object keys shouldn't be used directly as method names.

    ReplyDelete
  4. I've given this a lot of thought too recently. And I think the important difference between dot access and [] access is that typically dot access uses only an identifier (which never conflicts with a local variable), but the [] operator allows any arbitrary expression.

    ReplyDelete
  5. Brian, good point which I'd not emphasized here (even though in my TA job I was emphasizing your point to beginning programmers last week). Still, JavaScripts unification of the namespace for both syntaxes is interesting. But your point I think relates to the "object key" concept I last mentioned and helps provide weight in that direction.

    ReplyDelete