|
|
|
Hi, I am sorry it took so long to reply, I really struggled with this email.
There is so much I would like to say, but I did not find a good way to present
it and I am out of time.
Anyway, this is a collection of my notes. Hopefully you can make something of
it.
Feel free to contact me on freenode-IRC (nick: arnarl) if you want me to
elaborate on any point.
Martin, you had 3 questions for me in your (private) mail.
1. You'd like better categorization in Plone, not just folders and smart
folders and were looking for opinions.
You also mentioned some approaches in your mail to the list.
2. You wanted to know sort of categorization systems that "works in the real
world", mentioning RDF triplets and tag clouds as examples of systems you
have heard of.
3. Something that can be quickly implemented in Plone but that you do not
have to rework for the next version of Plone.
First, you need to figure out what and how you would like to present the
functionality to the end-user before you select any technology.
Interaction design is the key phrase. Keep it simple. We learned early on that
all the flexibility in the world does not matter if our users are unable to
understand or use the features.
You may find RDF or Topic Maps to be a feature overkill. At least if you only
want is improved classification. (More on the technologies later.)
Folders
=======
Folders are a semantically poor categorization system, with only
contains/part-of relationships.
Another problem with folders is that one often arbitrarily decide to follow
physical constraints that does not apply to information in a computer. "An
object may only reside in a single folder"
Tagging
=======
Tagging work well enough when you categorize and classify for yourself, but
have properties that might make them unsuitable for document collections.
Advantages:
1. Tags capture your own context at the time of tagging.
2. Tagging makes you stop and think about what you are classifying.
3. Tags use your own words and perspectives of what is important.
None of these work as well when you are classifying for others:
1. Visitors do not share your context, experiences or view of the world.
(This is a problem for other categorization systems as well, but the
freeform nature of tagging )
2. Folksonomies (ala del.icio.us) work because you can to a certain extent
rely on crowd-wisdom. This may not be possible in other systems.
3. Tags tend to be generic, meaning lots of hits for each tag.
(This suggests that it is the intersection of tags which is interesting
and a decent interfaces might overcome this. See Jon Udell.)
4. Synonyms, or multiple tags meaning the same. (Or do they?
5. Tags may be problematic wrt. multilingual websites.
6. Free-form, how do you manage the tags.
All of the problems with tagging or similar keyword based systems may be worked
around, though I believe you simply end up creating an ontology language.
(Often a poor replacement, though that does not mean it is without value).
While cool, tag clouds are just a visualization and perhaps not the best way of
navigating a document collection. (Nor area they a well known pattern from a
usability perspective)
Finally, even poor tags are better than nothing.
Controlled vocabularies or taxonomic classification
===================================================
One of the challenges of being an editor is to get your contributors to use
your categorization system, no matter what technique it is.
Classification carries a cognitive cost, meaning that it is work, even hard
work. You want your system to minimize the cost of doing this (what tagging
does so well). AJAX techniques *really* help here.
Additionally you want to reward good classification. In our systems we do this
by making well classified items appear more than non-classified items. That
includes search.
Contributors want their hard work to be visible. (At least some do
Technique: Search
=================
Make sure you include the classification in the search results.
In Zope you can cheat and make classification words appear multiple times in
SearchableText, that way weighting an item as heavier for certain words.
We use variants of this technique to get relevance weighting into our search
results.
A way too short intro to topic maps
===================================
Topic Maps are built from sets of topics and are simply a generic, standardized
datastructure pretty much the same way as RDF or Relational Databases.
Topics are just proxies for subjects in "the universe of discourse", serialized
so that computers can "say" things about them.
That's a fancy way of saying that a topic can represent both abstract concepts
like "love" as well as concrete physical items like cars or a document in a
document management system.
A Topic may have 4 kinds of properties in addition to rich mechanisms for
identity:
* Types (Multiple types are surprisingly useful, though we do not use it much)
* Names (Like most things in the real world topics may have multiple names)
* Occurrences (Pointers to information about the subject of the topic)
* Roles (How the topic relates to other topics)
Names, occurrences and roles are typed. The types are not integers, floats,
strings and similar traditional basic programming language types, but rather
other topics that represent the concept.
Examples of such types might be "employee" in relation to a company, or an
occurrence of a biographic article on a person.
I haven't even touched on important concepts like topic identity and scoping.
For a better introduction I recommend Steve Peppers "The TAO of Topic Maps".
On topic management
===================
Topic management is just a name we've made up to describe the way we administer
the ontology, the taxonomy and the knowledgebase using familiar content
management techniques like forms, workflow etc.
It is surprisingly powerful, and in some ways it is a better version of TTW
programming.
Our presentation at Europython wasn't really about classification issues but
rather on how we store the schema in the topicmap and the database. Think of it
as TTW Archetypes on steroids.
<http://www.python-in-business.org/ep2005/download.cpy?document=10778>
The taxonomy or information architecture is built from the central topics in
the topicmap. The taxonomy is simply built from special associations and may
thus contain every kind of topic in the topic map.
Classification vs. Categorization
=================================
Separating between what something *is* and what it *is about* may help clarify
some concepts and see some new options.
Consider that the taxonomy might contain type-nodes whose page may list all
instances.
On classification and Plone
===========================
I recommend studying literature on patterns and usage of existing technologies
like RDF and Topic Maps before implementing your own classification system. I
certainly found that very illuminating and it helped to clarify my own
thinking.
Very smart people have thought about these problems for decades and repeating
their experiments and mistakes might time consuming.
Lars Marius Garshol's "Metadata? Thesauri? Taxonomies? Topic Maps!" at
<http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html> is a good
read.
On Topic Maps and Plone
=======================
I firmly believe that a proper topic management implementation could catapult
Plone far ahead of almost all other content management systems out there.
We would love to introduce you to all the aspects of ZTM and see them adopted
(even partially) by Plone. Perhaps we can find a suitable time and location for
a workshop.
Topic Maps are starting to appear on our other plattforms (Microsoft, Oracle,
SAP, Java) and so far it looks like a success.
On ZTM and Plone
================
Like Limi wrote on saturday, I don't consider ZTM 2.X a good option for Plone
due to architectural weaknesses in Zope 2.X that we work around.
Basically an object in the object file system (ZODB) does not know its parent
due to old issues with circular references in Python 1.X.
This weakness means that using direct python-level object-references, while
handled beautifully by ZODB, is problematic. You need to ensure that the
security context is correct yourself.
I believe this is part of the reason the reference-engine relies on
portal_catalog. (At least it did the last time I read the code)
(ZTM is powerful on many levels, but you need to know some deep Zope voodoo to
use it effectively. This is due to the sad fact I that have never gotten around
to packaging it properly)
The architectural weakness is fixed in Zope 3, which should make a Topic
Map-engines much simpler to to implement cleanly. (This applies to reference
engines as well.)
Some definitions
================
Ontology -- In a topic map context the topic types, their properties and
constraints. (More generally Tom GrĂ¼ber defines an ontology as "A
specification of a conceptualization.")
Taxonomy -- A hierarchy (possibly polyhierarchy, nodes may have multiple
parents) of central concepts in the domain that one wishes to model. The
nodes are used as categories.
Topic Maps -- A standardized (by ISO) data model for capturing knowledge
structures and connecting them with information resources. Similar to RDF but
we considered them slightly more suited for content management.
Topic Management -- Administering both ontology, taxonomy and knowledgebase
using the same, well-known content management techniques.
ZTM 3
======
There is currently no functional prototype of our topic map engine for Zope 3,
but that is likely to change over the summer.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
|
|