InfoTools Survey Results

Yesterday, I gave a talk on InfoTools: Beyond Search at TiE Chennai. The slides of the presention are here. I think it went well, but I think, if I had cut down the slides and talk and gave more demos, it would have gone even better. Perhaps next time.

Before starting the talk, I requested people to give me (written) answers to three questions:

  1. What are your information needs?
  2. What are your problems with information?
  3. What tools do you use to manage information?

The questions, perhaps were a bit vague. I realized that after going through the answers. They varied in their level of granularity (specific vs generic problems) and the definition of information itself. But here they are (slightly modified to reduce redundancy).

Here is what I got from the survey:

What are your information needs?

  1. Potential customer info
  2. Right info at the right time (whenever I need it)
  3. To structure, unstructured web data
  4. Technology and Process to manage large newspaper portal
  5. Should be current, relevant (to my context. Should lead to (help?) actual decisions
  6. I need reference (information) of various consultants relating to start of business viz cost, web, management etc.
  7. Need for current information
  8. Need (to handle?) information from multiple sources and formats
  9. Collating information from multiple sources
  10. Information about competition
  11. About marketability and segments
  12. Company address information
  13. Company finance (annual reports)
  14. Executives within the company
  15. Trade details, products, services etc.
  16. Sales leads
  17. Knowledge enhancement
  18. Learning about old friends/acquaintances/family
  19. To learn to grow personally & business
  20. Needs to be local search for providers near to me (for ex: a photo copier shop near to my house)
  21. Technical solutions (day to day) for career and personal growth
  22. My business is providing information based services, package with recommendations. So need for information varies.
  23. Various technologies in market
  24. Information about market situation
  25. About stock/companies performance
  26. Details/support to solve issues
  27. Products available in the market for specifics(?)
  28. Focused News
  29. Similar business entity info
  30. Public info of competitor
  31. At a business level – market feelers about demand, ease of vendor options availability
  32. At an execution/implementation level – latent trends in tech
  33. Updated knowledge
  34. Price information about products etc.
  35. Information about technology
  36. Looking for acquiring an IT company. Need info on the industry they are in (macro) and more about that company (micro)
  37. Collect, compile for pattern understanding, plan for target customer
  38. Top IT temp staffing companies in India
  39. Total temp staff in IT in India
  40. How do I know the customer needs
  41. Scholarly articles on business entrepreneurship
  42. Product information, addresses from www
  43. Collecting/harvesting data from websites and collating, cleansing and delivering to clients
  44. Where is the resource for information?
  45. Where info is available, how to get data stream into our database
  46. How cost effective, credible, valuable is the data
  47. Accessibility
  48. About companies wanting to enter India -setup operations, joint ventures
  49. Companies in India wanting to enter other geographies
  50. Consultants from outside India needing partners in India
  51. Relevant, accurate data (specific to the task at hand)
  52. Info about prospective customers
  53. Info about vendors
  54. Info about current market
  55. Info about latest technology

Here are my list of information requirements (I took the survey along with others)

  1. Leads
  2. Trends
  3. Best practices

What are your problems with information?

  1. Locating the right data at the right time
  2. At times info overload
  3. Unable to get the right (specific) information
  4. Sometimes get caught into loads of data, making it difficult to sift through
  5. Credibility, cost and accessibility
  6. Frequent website updates
  7. Different formats of information
  8. Gettting data from complex templates and grouping into finite categories
  9. Precision, very difficult to get objective information
  10. Currency of data
  11. Comprehensiveness of data
  12. Need continuous monitoring
  13. Information overload and in such case, synthesizing & assimilating that information in a reasonable time frame is difficult
  14. Old data, not accurate
  15. Too much info
  16. Not easily accessible
  17. Irrelevant info
  18. Filter out the actual/real info from a large pool of junk data
  19. Do not have a scope to interact with peers in similar industries
  20. Direct actionable information takes several searches, navigation
  21. How to localize information (assume how to get local information) and get reliable info
  22. How to segregate info from the web
  23. Difficult to put together
  24. If put together, not sure whether it is the updated info
  25. If updated (up to date?) not sure about the integrity of the data source
  26. Availability (sources), Reliability (sources)
  27. Aggregation of data in a presentable manner
  28. Too much information
  29. Unable to identify precise locations quickly
  30. Quality of inputs not high (always)
  31. Too large varied and different
  32. Formats (word, pdf, excel etc. ), hard copies, books, magazines
  33. Difficult to authenticate, collate and organize based on requirement
  34. I like websearch engines but I strongly believe that these search engines are at a nascent stage. I just don’t need a site coming up in my search because it is in wikipedia or yahoo
  35. Inappropriate not timely
  36. Have to go through lots of notes/documents/pages to get a single piece of information
  37. Validating the information
  38. Storing and organizing information
  39. Time
  40. Where to see (sources?)
  41. Not a centralized reporting
  42. Assimilation requires a lot of pre-formatting
  43. Effective and speed search by everyone not followed
  44. Not sure what to look for, where to look for and how to get it
  45. Vast, use software to target timely, quick, on realtime
  46. Not able to source the information in the web
  47. We develop products based on blogs and emails. This is not enough.
  48. Too much info
  49. Info with noise

My List

  1. Signal vs noise
  2. Reliability
  3. Authenticity

What tools do you use?

  1. Blog, forums
  2. Google, web search
  3. Search engines
  4. Reliable third parties
  5. Friends
  6. Regular expressions
  7. Use bookmarking tools like delicious, share with team
  8. Knowledge repositories (wikipedia
  9. Books (online/printed)
  10. Inhouse tools to capture through automation
  11. Infosource – www, infoanalysis – spreadsheets
  12. Search engines to identify information
  13. Customized perl/php/vb.net programs to manage
  14. Scrape information from the web and manage it
  15. Search engines
  16. Networking sites (LinkedIn etc)
  17. Forums
  18. Email
  19. My brain power, word/excel
  20. justdial and few others provide localized service over phone but it is not so accurate
  21. Justdial
  22. Hakia
  23. None
  24. Excel/Computer/Notebooks
  25. Peer discussions
  26. IE Favorites (browser bookmarks)
  27. Bing
  28. Primary Research
  29. Internet, newspapers, meeting – software modules
  30. spreadsheet, email
  31. Internet, libraries
  32. Getting logic from other tools and using our own tools or languages
  33. Perl, regex
  34. Paid portals
  35. LinkedIn
  36. Spoke
  37. Ecademy
  38. Xing
  39. My memory (sigh)

What I use:

  1. Social bookmarks (delicious, stumble upon)
  2. Twitter Search
  3. Facebook groups
  4. LinkedIn Groups and Answers
  5. Custom search
  6. Blog/Feed Search
  7. Twine
  8. Semantic Search engines
  9. InfoMinder
  10. InfoStreams (feed aggregator/search)
  11. InfoPortals (just started)
  12. Tag clouds (generated)
  13. Concept Mapping tools
  14. OpenCalais
  15. Zemanta
  16. Wikis

This is a small sample (about 40+ people who attended my talk). But you can see some patterns. I think we have a long way to go beyond search.

Blogs – Keeping Content Current

I watch my blog stats once in a while (used to be a daily routine once). I notice that there are lots of readers for some of my old posts even though some of them may be outdated. When I noticed this, I stopped covering news on my blogs. I even wrote a post titled Does Currency of Information Matter?

Once in a while, I feel compelled to revisit a post and make some minor adjustments – add some new links or add a comment to updated information. I have been thinking about keeping my posts (at least a few of the popular ones) updated. Here are some ideas.

  1. Keep updating the posts with current information when relevant, but keep a copy of the old version. Obviously we need to use the same permalink (so that people see the latest version), but does that violate the concept of a perma link?
  2. Keep the old blog but change the beginning to link to a new version of the blog
  3. Write a new blog and give a new version number with a link to all the old versions (in case people want to follow entries and comments)

I may try all these different techniques. I am sure that some one already solved this problem. If you did, can you share it here? If not, any thoughts?

It is a River…

“It is a river, not a reservoir” from A guide to the overwhelmed: Part-II.  Rob articulates what most of us feel, so well.

One of the greatest and most depressing moments of enlightenment that ever dropped itself on me was the realization that I was not going to learn everything I wanted to learn in my lifetime. I was not going to do all the things I wanted to do.

I stopped worrying about it now.  I read a bit, blog a bit, talk about it a lot and think about it.  I feel happy when I get a few ideas and dream about a day where I can implement them.

via Stephen’s OLDaily.

Web Data Mining

One of my articles on Web Data Mining appeared in i.t.magazine. They were kind enough to permit me to make it available from my blog.

Almost all of us need information. A lot of information is freely available on the Web. Learning a few techniques on how to mine information on the Web is a useful skill. Here are some sample usage scenarios:

  • You are an entrepreneur who is planning to start a new software business. You hear that Web 2.0 and social applications are hot. You want to do some research to understand the marketplace, and want to prototype a few product ideas.
  • You are part of the CTO office of a software company, and are interested in short-, medium-, and long-term technology and business trends in your industry. You need this information to build skills in your organization, and to build a few concept prototypes.
  • You are part of the CIO office of an organization. You need to balance early adoption of technologies with providing a stable environment for your business; you don’t want to jump at every new technology. In addition to finding new tools an techniques, you also want to understand the risks and the maturity level of these technologies, which ones are being used for building applications, and you also want to track many non-technical factors.
  • You are an outsourcing company and want to find customers for your business and track trends in outsourcing. Being a jump ahead of your competition and carving a niche are important differentiators.
  • You are part of HR, or a Learning Officer, and need to plan for the skill development of your employees. You want to keep your software team happy and so need to know the latest technologies, tools and resources to plan training and skill development.
  • You are a development lead, and need to provide the team with the latest information on product releases, and access to product/technology knowledge bases. You need to know of any problems, including security issues, in the tools or software that you are currently using for your projects.

Broadly, there are several components to finding, using and sharing information.

  • Identifying and discovering information sources
  • Tracking information from various sources and filtering them for their relevance to your needs
  • Organizing collected information and sharing it with others

Information sources can be many. A few listed below are typical.

Information sources can be categorized as:

  • News sources
  • Company websites
  • Blogs
  • Search engines
  • Wikis
  • Discussion groups
  • Social bookmarking sites
  • Social networks

web-information-sources.jpg

This article ( webdata-mining.pdf) describes these sources and their significance in more detail (the article uses British spelling which is common in India).

Web Information Sources

Here is the mind map of various web information sources. This is not an exhaustive list. I will have a few posts following that describe each one of these in more detail.

web-information-sources.jpg

Look at this entry for some contextual information.

Update Jul 1, 2009

There are a whole host of new sources. So I will add them to comments and try to update this mind map once in a while.

Here are some:

Freebase is a social database of open data
Twine is a smart way to keep track of information and share it with others. It goes beyond simple bookmarking.
data.gov is a fabulous source of  US government information. Will try to find and add other similar resources for other governments.

Conversational Writing

Here is a nice post on why Conversational writing kicks Formal Writing’s Ass.

What most people mean when they say “write the way you talk” is something like, “the way you talk when you’re explaining something to a friend, filtering out the ‘um’, ‘you know’, and ‘er’ parts, and editing for the way you wish you’d said it.”

It makes sense. Your writing style is influenced by what you read.  But email changed all that. Email is mostly conversational. And blogs are conversational too. A lot more of what I read, nowadays,  comes from blogs. If conversational style is good, and blogs are mostly conversational, it makes sense to increase blogs in an organization so everyone can communicate more effectively.