DB4ALL: reformatting the mess that Internet has become

I always try very hard to keep my posts within the main topic of this blog, namely computers in the context of building automation and simulation. Occasionally I fail, like for today’s post.

I’d like to tell you about a software company co-founded by a friend and fellow Toastmaster of mine, David Portabella. The company’s name is DB4ALL, and they specialize in software for retrieving structured data from the web.

(Disclaimer: I am not affiliated with this company. I have had the opportunity to play with their tool, which I sincerely think is a high-quality one, but I derive no remuneration from writing this piece.)

They’ve developed `Webminer’, a Java library for extracting data in a structured manner from any website. Suppose, for instance, that you need a relational database with the data from the CIA World Factbook. That data, though in the public domain, cannot be obtained in the form of a relational database, but only by clicking around on the CIA website. But with ‘Webminer’, the smart guys at DB4ALL can write a custom application that will know how to navigate such websites, ‘scrape’ and ‘normalize’ its data, and save it to a relational database for you.

On DB4ALL’s website you will find references to the two most popular datasets that they’ve mined: the above-mentioned CIA World Factbook, and the SourceForge database of open-source projects. Having such data in a relational form is invaluable for any researcher or marketing analyst. Suppose for instance that you want scientific data on the popularity of different programming languages over time in open-source projects. Well with these datasets you have all you need to get started.

This, for instance, is a screenshot of the SourceForge dataset opened in Excel:

All in all, if you need publicly available data from a website stored in a relational database form, you should definitely consider using DB4ALL‘s services.

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • FriendFeed
  • HackerNews
  • Live
  • Ping.fm
  • Reddit
  • Slashdot
  • StumbleUpon
  • Twitter

Related posts:

  1. Bug finding tools for Java
  2. Java GNU Scientific Library project update
Posted on July 12, 2010 at 11:09 am by lindelof · Permalink
In: Announcements

Leave a Reply