Discussion:
How to handle jaywalking - parsing a comma-delimited value in field and do lookups (1,4,5,6...)
Snaky Love
2010-07-19 10:16:47 UTC
Permalink
I have an "interesting" problem here - in a given mysql database
scheme some sql wizard used comma-separated values in a text-field
and
with that values lookups have to be done. The data itself is simple
numbers like 1,34,25,66,78,134 and so on. So what I have is something
like this:

id | username | groups | more...
---------------------------------------------
1 | name | 1,23,4,55,6 | ...
2 | name2 | 3,2,4,5 | ...

The groups string can be very long. there is also a table "groups"
with "id, groupname", as expected, from which the lookup has do be
done..

Yes, this is bad design. No, I can not immediately change the design,
I will try to, but atm I have to handle the situation as it is.

Of course normally this would be solved with an intersection table
and
a many-to-many relation. For anybody interested: there is a book by
Bill Karwin called "SQL Antipatterns" - he names the described design
Jaywalking and it is the first antipattern in the book. I was
surprised to find it in real life....

So my question is: how to handle this with web2py? I really would
like to use web2py to build a nice management interface on top of
that
tables, but currently I do not know how to go on with that jaywalking
antipattern in my way.

My first idea was to create my own intersection table - but the
process of [re-]converting data does only work in a
static environment - the data is heavily used and so
transformations of tabledata would be neccessary on every request to
mirror the live situation...too slow! So I am looking for a good way
to implement some kind of layer that would translate this field
with comma-delimited data into something that can be used with web2py
crud forms.

What do you think? How to handle this?

Thank you very much for your attention!
Snaky
Thadeus Burgess
2010-07-19 10:45:36 UTC
Permalink
Web2py is actually guilty of performing this same operation, however instead
of commas, we use the pipe char '|'

The theory works like this

If you have the following

|1|23|4|55|15|

You can perform the following query to find the record with group id of 55

db(db.groups.like("%|55|%")).select()

This works as expected. However, in your case you have a problem. The fact
that the beginning and end of your
records don't wrap with a comma! So if you wanted to look for the group 1 in
your dataset... well it just wouldn't work. You can't do a like query on
"%1%" either, because that would also match if you have group 15. Dope.

Is it possible for you to run an update query on the table, and wrap every
record with a beginning and ending comma? If so, you can just use the simple
trick we use in web2py but use a comma instead of a pipe. Otherwise, you
might be better off attempting to re-design the system (you should at some
point anyways)

Hope that helps.

--
Thadeus
Post by Snaky Love
any-to-many relation. For anybody interested: there is a book by
Bill Karwin called "SQL Antipatterns" - he names the described design
Jaywalking and it is the first antipattern in the book. I was
surprised to find it in rea
Snaky Love
2010-07-19 11:45:12 UTC
Permalink
Hi Thadeus,

I do not understand what you mean with
Post by Thadeus Burgess
Web2py is actually guilty of performing this same operation, however instead
of commas, we use the pipe char '|'
where does this kind of thing happen with web2py? Could you please
specify what exactly are you writing about here? Does this mean the
web2py database abstraction layer produces this kind of hackery
instead of intersection tables on a many-2-many relation?

Of course I will check this out for myself sometime this week maybe...
I can not believe it!
Post by Thadeus Burgess
From (not only) my POV this kind of "trick" brings many problems...
one of the problems is that it is nearly impossible to access the data
with standard reporting tools, frameworks, database management
software... I think it is a good think not to hide data behind a
"trick".
Post by Thadeus Burgess
Is it possible for you to run an update query on the table, and wrap every
record with a beginning and ending comma?
No, it is not possible atm to change anything. The original developers
also had the idea of using a "trick", and if I change anything the
original application will break. Yes, I can copy the database, but I
am not asking for a workaround here, I would like to know if there is
a good way to handle this kind of problem with web2py. I am also
researching this for other frameworks.

Thank you very much for your attention!
Thadeus Burgess
2010-07-19 12:41:20 UTC
Permalink
Post by Snaky Love
where does this kind of thing happen with web2py? Could you please
specify what exactly are you writing about here? Does this mean the
web2py database abstraction layer produces this kind of hackery
instead of intersection tables on a many-2-many relation?
Consider the following schema.

db.define_table('groups', Field('name'))

db.define_table('accounts',
Field('name'),
Field('group_id', db.groups, requires=IS_IN_DB(db, 'groups.id', '
groups.name', multiple=True)
)

In this case, what web2py actually does is create a string field (because
you specified multiple = True), and then
when you insert a record using SQLFORM, it will automatically parse the
multiple checkboxes into the jaywalking trick.

Its stupid, thats another reason to add to my very long list of dislikes for
SQLFORM and how web2py couples these things together. I would much rather a
Field('group_id', db.groups, manytomany=True) type that the DAL would handle
automatically for me... but that is getting into ORM space, so the right
thing to do is to handle many to many relationships manually.

That is all fine and dandy, back to the problem at hand.

I'm afraid though that anything will be hacky because of the design flaw.
Hopefully there are no spaces in the data? You might have to do something
such as

I want record 5 out of the following set

5,55,32,12,16,15

Well, I need to look for a single 5 that could appear at the beginning of
the string, the end of the string, or somewhere in the middle of the comma
separated values... nasty, but it might look something like this:

db((db.groups.like('5,%')) | (db.groups.like(',5%') |
(db.groups.like('%,5,%')).select()

This won't be fast though. You won't be happy with the performance in speed.
BUT it will work.

--
Thadeus
Post by Snaky Love
where does this kind of thing happen with web2py? Could you please
specify what exactly are you writing about here? Does this mean the
web2py database abstraction layer produces this kind of hackery
instead of intersection tables on a many-2-many relation?
Snaky Love
2010-07-19 15:30:31 UTC
Permalink
Hi Thadeus,

thank you very much for your answer once more.

I am really *very* surprised to see that a University professor in
Computer Science and Software Engineering does promote this kind of
software design...

Thadeus, would you like to send me a pointer to your "very long list
of dislikes for SQLFORM"? I am really interested in your findings!

Thanks again for your attention,
Snaky

Loading...