Content Modeling for JCR

posted 09:39PM Jan 25, 2008 with tags fmc jcr modeling softwaredevelopment tips by Lars Trieloff

If you are just reading this weblog, you might have been missing my post on Content Modeling for Java Content Repository (JCR) at dev.day.com.

http://weblogs.goshaky.com/weblogs/lars/resource/jcrmodeling2.png

In this entry I propose an alternative to content modeling with CND and ad-hoc notations based on combining aspects of UML and Fundamental Modeling Concepts.

| Comments[1]

Fighting Wiki SPAM

posted 09:57AM Jan 07, 2008 with tags google softwaredevelopment spam transformers wiki by Lars Trieloff

Social Software is software that gets spammed. This applies first and foremost to e-mail, but Wikis and Blogs are also preferred targets of wiki spammers. The following rules should act as a guideline for everyone who designs Wiki software, evaluates Wiki software or needs to configure a Wiki that is under attack by spammers.
  1. Understand the way spammers think and work: The main goal of most wiki spammers to to create link spam that will lead search engine crawlers and algorithms, especially Google's into giving their or their customer's websites a higher rank for certain keywords. In order to achieve this goal, they try to create keyword-specific links wherever possible - and this means in your Wiki. In order to create a large number of links in short time, they write small software programs that know how your Wiki software works, and sends the correct request to create new pages or new page revisions. As in the movie "Transformers" Your wiki has become a playing field of robot wars. On the one side "destroy" are the spam-bots, on the other side the googlebot. In order to further familiarize with the way Wiki and Blog spammers think, I recommend The Register's "Interview with a link spammer".
  2. Do not be an attractive target: The best way of preventing Wiki spam is not being a target of Wiki spam. Spammers find Wikis vulnerable to SPAM attacks by searching on search engines for pages that already have been spammed by somebody else. A page that is spammed and found via a Google search is vulnerable and attractive, because the spammer knows, Google will see their spam. In order to not being an attractive target, it is important to remove all existing SPAM from the Wiki and make sure, SPAM is not going to be picked up by Google and other search engines. A mechanism that has been proposed to achieve this goal (and that has been found to be effective) is using the rel="nofollow" attribute in all links that could lead to SPAM. Some wiki software applies this to all outgoing links, some wiki software only to outgoing links that do not conform to a white list of allowed pages, some wiki software only to outgoing links on newly edited pages. The most important rule however is: Exclude all archived versions of wiki pages from being indexed. If your archived pages are being indexed, the spam will be picked up by the search engines, no matter how fast you are to revert the changes. Good techniques to achieve this goal are using the <meta name="robots" content="noindex,nofollow"> tag in the head of all history or archive pages. In order to further familiarize with learning how to exclude pages from being indexed, take a look at The Web Robots Page and Google's Webmaster Central Blog on using the robots meta tag.
  3. Use your community to fight spam: What is SPAM and what is legitimate content? As good as robots might be in creating SPAM, humans beat them by orders of magnitude in detecting SPAM. As your community profits most from your Wiki, you should invite the community to join your spam fighting efforts. This means, regularly observing the "Recent Changes" page, skimming through changes and change descriptions (SPAM robots seldom use change descriptions that fit to the usage patterns of your wiki), and reverting spammed pages to a clean revision. By selecting a Wiki software that has a "revert" or "rollback to last revision" feature, you are giving your users a powerful weapon in the fight against robots, because they can be faster in spotting the SPAM and clicking the link than most robots. If wiki spam is a major nuisance for you, you should engage in the Chongqed community, which is devoted to fighting SPAM in Wikis and retaliating against spammers (which I doubt is worth the effort). If you do not have a community that can help you fighting SPAM, you should probably disable editing in the Wiki or shut it down completely. Without a community, you will loose interest sooner or later as well, but spammers will continue to find your Wiki and attractive target.
  4. Ban content, not users: Lots of spam fighting techniques involve some way of banning certain requests, based on user agents, time of day, frequency of access, IP address range, etc. Other techniques require registration, use CAPTCHAs. All these techniques have a number of disadvantages, the most important aspects are that they create false positives, e.g. blocking legitimate edits that just happen to use the wrong user agent, time of day or IP address range, some like CAPTCHAs and required registration will even raise the barrier of contribution, leading to less legitimate editing attempts, so many users will not even try to contribute to your Wiki and - finally - they can be circumvented by a clever spammer easily. Especially IP address based blocks can be circumvented by using open proxies, dynamic IP addresses or botnets. The only thing that spammers cannot disguise is their intent to create links with specific targets and keywords in your Wiki. The most effective techniques are therefore based on banning content. This means banning URLs based on regular expression patterns (you do not have to build a database of these patterns yourself, there is an excellent one available at http://blacklist.chongqed.org/), content based banning based on regular expression patterns for text in the Wiki, e.g. for keywords (this will be more difficult if your wiki is devoted to gambling or erectile dysfunction medication) or even on the number of URLs posted in one editing steps or the URL-to-other-content-ratio in the post.
  5. Stay up to date: Staying up to date means keeping up to date with the version of your Wiki software, which might not only close bugs and create interesting new features, but also introduce new mechanisms to fight SPAM. And staying up to date means keeping up to date with new techniques used by spammers and ways to fight them. A good resource are the C2 Wiki (THE original Wiki) and the Chongqed Wiki.

Similar rules apply to other kinds of social software that allow user-generated content, especially blogs and social networks, but depending on your application the motivations and techniques of the spammers might vary.

Maven news

posted 09:23AM Nov 20, 2007 with tags eclipse maven opensource softwaredevelopment by Lars Trieloff

It is good news to see that the The Eclipse Integration for Apache Maven has been approved and that there will be official Maven integration into Eclipse, based on the work done by Carlos and other for Q4E, really soon. (via Carlos).

In other news - if you are using Maven for your Java builds, which you should, have a look at Brett Porter's slides on Maven Best Practices from ApacheCon. (via Steve)

RESTful Web Services

posted 10:22AM May 25, 2007 with tags architecture books rest softwaredevelopment by Lars Trieloff

John Udell reviews RESTful Web Services, an O'Reilly book by Leonard Richardson and Sam Ruby. I buy the book. That's what a review should look like. I do not need to read reviews of books I am not going to read.

Everything I learned about Javascript, I learned on the web

posted 10:39AM May 22, 2007 with tags javascript softwaredevelopment webdevelopment by Lars Trieloff

Yesterday, during an interview I said regarding the fact that, according to Ohloh our applications is written mostly in Javascript: "Javascript is an impressive elegant, but often misunderstood language.". The applicant answed that this was the first time he heard someone saying that, but recent signs show I am not alone with my opinion:

The interesting thing is: I have been developing in Javascript for nearly all of my programmer's life, but have never possessed a single Javascript book. This is part due to a misunderstanding of the language ("It's just a toy language for scripting web pages" - which it is not) and part due to the great resources on the web for developing Javascript.

With more and more applications built unpon Javascript, server-side, client side in web sites, in rich internet applications based on Firefox, Thunderbird or XULRunner or Actionscript, developers will have to learn about the beauty of Javascript, but also about the dark sides.

| Comments[1]

The Joy of Bugfixing

posted 08:21PM Mar 22, 2007 with tags opensource softwaredevelopment softwarequality by Lars Trieloff

The last two days Jan, intern at Mindquarry spent fixing a bug in Cocoon's AJAX form handling. In the end Alexander and I joined him because we had more experience with Cocoon's internals and finally we found and fixed the problem. Having identified a bug, but not the bug's cause or resolution is a terrible feeling. You try this approach, you try that approach, you discuss, consult search engines, and feel more and more incapable of understanding the problem. But once you have found the solution, you feel great, you feel like being able to grasp the most complex technical structures.

I keep saying "Software development is 90% baning a head agains the wall and 10% breaking through the wall". Bug fixing multiples this principle. 99% of the time, the actual bug fixing, searching for the cause and solution makes you feel bad, but the 1% of time when the bug is finally fixed is simply great. The joy of bugfixing outweights the pains of fixing the bug, this is way bug-days when a whole development community spends a day fixing bugs are so popular.

A Java API for REST

posted 11:58AM Feb 15, 2007 with tags api cocoon java rest softwaredevelopment xml by Lars Trieloff

There is a new JCP aiming to create a Java (TM) API for RESTful Web Services. The proposal was sumitted by Sun and is supported (among others by) Jérôme Louvel, creator of the RESTlet Java framework.

Interesting Commentary:

Pete Layey:
I hope they don’t screw it up (see JAX-WS).
Marc Hadley: provides a code example of what he thinks the API might look like. Looks good from my point of view

Steve Loughran:

It seems to me the people who have a better idea of what to do are the Cocoon folk and Team Netkernel, not the WS projects. Yet I suspect it will be the latter is the most interested, because clearly REST is winning the battle for hearts and minds, at least outside the enterprise. The trouble is, work on WS too long and you get corrupted, you start thinking of methods and operations, not remote state.
Indeed, as we are building REST applications with Cocoon, adhering to or integrating this API is an interesting point to watch. Stefan Tilkov:
First, we asked why he (Marc Hadley) feels a REST-specific JSR is needed, i.e. why Servlets and JSPs are not enough. We also questioned the spec’s wording about low-level APIs, and how one would go about developing RESTful web apps without a deep understanding of HTTP issues and design patterns. Marc replied that while the current APIs provide broad support for HTTP, they leave a lot of work to the developer that could be automated in a higher level API
Contains lots of other good references and quotes.

Top 10 Reasons to upgrade to Eclipse 3.3 M5 right now

posted 02:18PM Feb 11, 2007 with tags eclipse java softwaredevelopment tips by Lars Trieloff

Eclipse's last milestone for the 3.3 release is out for two days, and these are my top-10 reasons to upgrade:
  1. SWT libraries automatically found (since M4) - this eases deployment of SWT applications dramatically. No longer setting java.library.path, just one single dependency, easily expressed as a Maven 2 dependency
  2. System tray support added on Mac OS X (since M1, somehow workable since M4 - your starting class has to be in package org.eclipse.swt) - Now there is real cross-platform support for tray icons, notification area icons or menubar items
  3. Code clean up on save (since M3) - makes it easy to adhere to coding conventions without much manual formatting
  4. Text drag and drop in text editors (since M5) - very useful for manual reording of code
  5. Advanced tooltips (since M4) - tooltips can contain more than text
  6. New DateTime control (since M3) - date and time entry with improved usability
  7. Mozilla Everywhere (since M5) - when you need a controlled web browser control and cannot rely on the operating system's default
  8. Improved completion in annotations (since M2) - good for users of libraries like DAX that use annotations.
  9. Working sets for the Project Explorer (since M2) - working set support for non-Java projects
  10. Apply Patch offers full context patch preview (since M2)
There are numerous other improvements in these five milestones, but these are the features I like most and that convice me to upgrade to Eclipse 3.3M5 and use SWT 3.3M5 and JFace 3.3M5 for desktop GUI development.

Spring modules for JCR

posted 02:59PM Feb 10, 2007 with tags jackrabbit java jcr softwaredevelopment spring by Lars Trieloff

When you are building applications using Spring and JCR with a repository like Apache Jackrabbit, the newly released JCR Spring module (see org.springmodules: spring-modules-jcr 0.7) might be for you.

It contains Spring FactoryBeans that allow you to access JCR repositories directly, via RMI or JNDI.

More Testing Means Better Debugging

posted 10:44AM Feb 08, 2007 with tags debug softwaredevelopment softwarequality test by Lars Trieloff

Cedric Beust, lead developer of TestNG writes in More testing doesn't mean less debugging
This myth that "testing means no more debugging" needs to die. Seriously. Anyone making this claim is doing a lot of damage to the software engineering world. Just as we are finally getting good IDE's, good debuggers and, more importantly, an increasingly widespread conviction in the developer community that these tools are part of a healthy software engineering process, a new vanguard of smug programmers come out of the woodwork with their superior attitude and resurrect the old mantra that only bad developers use debuggers.
Cedric continues to point out how the debugger has become an invaluable tool for software development as it allows him to verify his assumtions during development, to examine the runtime behaviour of the software, as the pure static sourcecode is just a description of a software system.

The value of unit tests in regard to debugging is that they allow more structured, repeatable debug situations. I use test cases to create situations that need further verification and can then step through the program at runtime.

Eclipse Bundles in the Maven Repository

posted 10:18AM Feb 08, 2007 with tags eclipse maven softwaredevelopment by Lars Trieloff

Carlos Sanchez points to the new Eclipse dependency packages in the Maven repository, that are automatically generated using Apache Felix maven bundle plugin. Carlos is working on improving the bidirectional mapping of Maven project object models and OSGi bundles.

Eugene Kuleshov comments:

With number of Eclipse bundles and its frequent release cycle, the idea if such repository don't really makes sense, maybe except for the Equinox itself. For the rest of bundles it is more appropriate to use Eclipse install as a repository of special format (it is called Target Platform in Eclipse's own build).

I cannot agree. When you are building SWT applications or RCP applications outside Eclipse, you cannot rely on Eclipse's build system, there needs to be an external build system and you cannot rely on having Eclipse as the target platform installed to provide all dependencies.

| Comments[1]