Let’s Build Our First Token Transformer

BRM is about rewriting and transforming python sources. It offers you to roundtrip back to what it was earlier by the power of tokens and comfort of Transformer interfaces. This tutorial we are going to focus on rewriting a square root character () transformer. We’ll handle all forms of square root operation like √9 or √16.0 and then transform them to <number> ** 0.5.

Creating First Transformer

A Transformer is a class that provides utilities and dispatching for tokens. For an example you can create a class and do nothing, just listen what it gets. Lets write one that listens numbers.

First of all you need to subclass TokenTransformer in order to add new methods. Let’s do that

from brm import TokenTransformer

class NumberHandler(TokenTransformer):

For registering specific token types, you need to define a function with visit_<token-type>. It is like the ast.NodeTransformer but instead of nodes we use token types. You can get the name of all token types by checking token module and it’s docs. Our token is NUMBER. If you want to see which tokens a python expression or statement consists from you can do it in interactive shell by instantiating TokenTransformer and calling quick_tokenize on it.

>>> import brm
>>> transformer = brm.TokenTransformer()
>>> transformer.quick_tokenize("1.0")
[TokenInfo(type=2 (NUMBER), string='1.0', start=(1, 0), end=(1, 3), line='1.0')]
>>> transformer.quick_tokenize("100")
[TokenInfo(type=2 (NUMBER), string='100', start=(1, 0), end=(1, 3), line='100')]
>>> pprint.pprint(transformer.quick_tokenize("100 + 100"))
[TokenInfo(type=2 (NUMBER), string='100', start=(1, 0), end=(1, 3), line='100 + 100'),
 TokenInfo(type=54 (OP), string='+', start=(1, 4), end=(1, 5), line='100 + 100'),
 TokenInfo(type=2 (NUMBER), string='100', start=(1, 6), end=(1, 9), line='100 + 100')]

Let’s add a visit_number method and see what happens to our NumberHandler when we call it with some numbers.

def visit_number(self, number):
    print("is a number? (always yes)", number.type == token.NUMBER)
    print("what it contains?", number.string)
    print("where it starts?", "y_start={}, x_start={}".format(*number.start))
    print("where it end?", "y_end={}, x_end={}".format(*number.end))

After this definition, we are going to instantiate our NumberHandler and call transform on it. transform method is responsible for everything related to source rewriting. It takes python source as a string (like what you read from the python file) and then it registers new tokens if they are available (like we are going to register our square root token in this step) it continues with invoking transformer methods like the one we just created (visit_number), if there are no defined transformer methods for that token, it calls dummy. You can probably implement something like that to your subclass for watching undefined nodes.

def dummy(self, unknown_token):
    print("Unhandled token:", unknown_token)

Let’s test our visit_number and dummy out.

>>> number_handler = NumberHandler()
>>> number_handler.transform("2")
    is a number? (always yes) True
    what it contains? 2
    where it starts? y_start=1, x_start=0
    where it end? y_end=1, x_end=1
    Unhandled token: TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 2), line='')
    Unhandled token: TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')
'2'

It took "2" and returned "2", it also answered our questions. Now what? What are those unhandled tokens? One of them is just a newline token, which its name states. The other one is a marker token which indicates we reached end of input. We can actually check that with token.ISEOF in our dummy function.

def dummy(self, unknown_token):
    if token.ISEOF(unknown_token.type):
        print("Reached EOF without a problem, congratz")
    else:
        print("Unhandled token:", unknown_token)

>>> number_handler = NumberHandler()
>>> number_handler.transform("2")
is a number? (always yes) True
what it contains? 2
which line it was taken 2
where it starts? y_start=1, x_start=0
where it end? y_end=1, x_end=1
Unhandled token: TokenInfo(type=4 (NEWLINE), string='', start=(1, 1), end=(1, 2), line='')
Reached EOF without a problem, congratz
'2'