How do you use User Ratings?

I was thinking about those user ratings on products again. I last posted here. Fascinating topic. We all make choices, how?

In one page they mentioned a comic that is so relevant, not only to the subject but also to the Hurricane Sandy, what a coincidence: TornadoGuard


  • Oct 12, 2013: On the “A Bayesian view of Amazon Resellers” blog post by John D. Cook the comments are very interesting. A lot of smart people in the world! Anyway, one commenter, Ian Maxwell, mentioned that one could use the Rule of succession for these kinds of problems.


  1. Collective Choice: Rating Systems
  2. How do you rate user ratings?
  3. TornadoGuard: The Problem with Averaging Star Ratings
  4. 5 star ratings. Bayesian or Weighted average?
  5. How Not To Sort By Average Rating
  6. What is the Rating Average and how is it calculated
  7. Algorithm for Rating Objects Based on Amount of Votes and 5 Star Rating
  8. Rating Scale
  9. A Bayesian view of Amazon Resellers
  10. Bayesian average
  11. Brewing a Better Rating System
  13. Collaborative filtering
  14. ID3
Should Consumer products indicate comparable filler content?

Your at a store and are presented with a choice among two competing products. Lets say its a bottle of juice or some dish soap. How do you decide what to buy?

One way is to see the relative cost per measure. That calculation is already done for you in that little price tag on the shelf, can’t recall the name of that standard. So, price comparison is easy. You could even look at its ingredients; they are usually listed in size order, etc. There are even mobile apps to help you make that decision.

However, that decision is bogus since you don’t really know how much of that product is just filler. Which juice has the most water, for example? Some products will state what that is, like 2% real juice. Is that enough? What do they mean by that? Do they dry out the real juice measure it, then reconstitute it back into liquid form? I think its like that “cheese food” label, all a scam. Boy, am I being negative this week.

Now companies have a right to trade secrets and all that. But, as consumers we would like to know when we are just buying colored water. Or maybe we don’t. After all, we twaddle around with our fat asses in the big box stores searching for deals on junk food to keep the billion dollar soft sweet drink industry going.

Anyway, there must be some better ways to make our devalued earnings buy a little more.

  1. Toward a Consumer Product Information Resource
  2. 1862 – 2012: A Brief History of Food and Nutrition Labeling
  3. Food Ingredients Most Prone to Fraudulent Economically Motivated Adulteration

Duplicate files tool? Free, but your credit card is …?

The web made sales games a lot easier, in particular the bait and switch.

I just resuscitated an old hard drive, bought an external enclosure for it.  Hooked it up.  Nice.  Now I want to clean it up.  Does it have stuff I may be missing from other drives?  Can’t tell, so many files and many duplicates.

So, first step is remove the duplicate files.  Do a web search for Win7 file duplication utilities.  What a mess.  Its hard to figure out what is a legitimate offering.   I opened a forum where the question is asked, what is a good duplicate file finder.  One response was that “XYZ Duplicate File Finder” (I changed the name of the actual tool) was easy and free.  Did the developer of the software post it?  Is it really free?

I visited the site.  xyz_site.  Well, its a ‘com’, site, but that can mean anything.  For example, they can sell their product, but give away crippled versions or older products for free, like winzip.com.   The thing is, nowhere on the site does it state the price.  If you dig into the site there is a page that indicates you have to pay for it:  the help-register-xxx.html   Yet, even here there is no price!  In fact, the license expires, but on the license FAQ page it says you only lose the ability to receive free updates.

I don’t mean to single out the makers of this software.  Perhaps I missed the price somewhere or misunderstood the site itself.  This type of site is very common in the Windows world.   I’m all for people making an honest buck, but the key here is honest.

Lets look at the winzip site.  Right on the first page, it gives the price.  That’s nice.  What I don’t like about it though is that there is a download button.  What, I can download a trial or free version?  Nope, click the button and way down at the bottom of the download page that results they tell you its for registered users only.  And which you probably won’t see unless you scroll the window.  Huh?  Why not call the “Download” button the “Upgrade” button instead?

I don’t think you see this kind of stuff in the Linux or Unix world.  Probably because people in the *nix world want every useful utility is free or eventually built into the OS itself (OpenSolaris has dedupe via ZFS)?

I guess in the Windows world free means, not to the user, but to the sellers who are free to do whatever they want.

Now how do I remove duplicate files?  Maybe I can just write a script to do it.  I could create a database and then query for duplicates, that must be easy in SQL?  I wonder how its done in Linux, probably a one-line Perl script.

By the way, I had some luck with Duplicate File Finder 0.8.0 by Matthias Boehm.  Also, in the Linux world this can be done with very clever bash scripting.  Here is a sample scripting approach.

What I don’t like of the tools I’ve seen so far is that they are very bad in terms of usability.  They should look at how diff and merge tools do things, like KDiff3.  Another example to look at is the graphics approach found in D-Dupe.

Toward a Consumer Product Information Resource

Here is a quick draftI put together about an idea for a prodinfo centralized system that consumers can use.

Text Search is overused on the internet and is becoming counterproductive with all the sales ads, SEO optimized sites, junk, flashy eye candy, and self-serving misinformation.

I’m working out the parameters and scope.  There are similar ideas out there, but I didn’t see something like this.

I posted this idea to the United States Department of Commerce via the http://opengovtracker.com/ project which was asking for ideas on improving government and its services.

The ongoing draft document will be stored here:  ConProdInfo

Of course, after I “finish” this, now I’m stumbling on more interesting related info on the web.

May 10, 2010:  Working on second part of this effort.  How does a consumer actually use this information at the point of sale?
3 Mar 2010:   Found out about a related patent.  But not the same.
30 Mar 2010:   Just found out about ID@URI effort:  http://en.wikipedia.org/wiki/ID@URI and the AUTO-ID Labs at MIT: http://autoid.mit.edu/cs/

Consumer Product Information Resource

Helping consumers find usable product information

By Josef Betancourt

Created: 20100318T15:23-05   Modified:  22 March 2010  15:13

Dedicated to Mr. Bloom, math teacher at P.S. 123 in Brooklyn, NY


Finding product information, especially for the consumer, can be difficult.  Presented is a system that, using a proposed new law requiring the registration and labeling of a product with an associated network addressable identifier, provides an information resource a consumer can access.  This new resource will be a central point of authoritative third-party complete and durable information and can serve as a conduit to related data.



  1. Information Problem
  2. Solution
  3. Advantages
  4. Product Range
  5. Standards
  6. Information Content
  7. Information Creation
  8. Information Representation
  9. Funding
  10. Organization
  11. Requirements
  12. Related Efforts
  13. Conclusion
  14. Links

1. Information Problem

Most products (trade items) come with information already printed on their container or packaging.  Many do not.  For example, one particular Air Freshener for an automobile does not have any information on its ingredients, just a sales pitch.  There is also the possibility that the information can be separated from the product.  Consumers may throw away the packaging; damage affixed information, or misplace included pamphlets.

When we need information on a product, which could be anything such as a toy, an over-the-counter drug, or a nutritional supplement, the conventional internet based approach is to use a search engine to produce a Search Results page.  Using today’s search engines, the results are usually very usable.  On a recent search results, at the top are two ads, but clearly labeled as ads.  Then the first link is for a Wikipedia entry.  The Wikipedia entry is rather informative with plenty of cautions and warnings. From Wikipedia we follow other links, such as to drugs.com.  Then, we go back to the search page, and continue to examine the search results or edit the search terms and try again.

The problem occurs as we continue to examine the linked pages on the search results page.  Soon we find ourselves trudging thru the web wherein the plethora of information sources, formats, and point of views become overwhelming noise.  In many cases the information is at the wrong level or locked behind subscriptions or comingled with advertising.  The web thus becomes a perfect avenue for misinformation.  This is even more dramatic in some product manufacture’s sites where one could be inundated with fancy animations, movies, and ads whose intent is to sell the product and not really provide information.  Worse are all the Search Engine Optimized (SEO) pages that exist to increase page hits but do not provide additional information.

Compounding the effort to find product information is the fact that the internet is predatory. It has been described as the wild west. Thus, for the innocent and inexperienced consumer many hazards await. While diverse gimmicks to show advertising are just annoying (though theft of processor cycles) the malware are insidious. If even experienced users who let down their guard are susceptible, imagine the casual user who is just trying to find out who made the widget in their closet.

Searching is also repeated redundant effort. Basically, if millions of people already searched for product x and found relevant information, when we search again for information on product x, we are just repeating the same work.  This is a time, energy, and personal expense.  Even when there are sites dedicated to important product information such as recalls and warnings, one must still visit these sites and search again!

This product information location problem is a symptom of the web’s more fundamental problems:

“For all its might, utility, and growth, what we have today is a scattered web, a web of destinations, on which finding information requires a whole expedition across loosely-connected archipelagos of data, each with its own information  requirements*, rules of engagement, and gravitational attempts at capturing your time and money.” (Boutin, 2010)

But, waiting for the web to evolve is not practical, thus we present an alternative that provides a direct link between a product and information on the product, and this is optimal and a good use of technology:

  • We explain the core change to product packaging that contains links to information. We give some high-level requirements of the system.
  • We give some guidelines on implementation.
  • We provide some benefits the system will provide.
  • We discuss the unknowns and shortcomings of the system.

2. Solution


Every product sold or advertised will use its product identifier for the creation of a unique link to a product information resource on the web.  An example, is a soft drink having on its packaging the link, http://www.cpia.gov/prodinfo/H2abd8d93, with an associated logo.  Note, the cpia.gov, for Consumer Product Information Agency, is for example only.  No such agency or resource exists.  Also, the web address is not necessarily the target format of such links.  An alternative is to just present the required product identifier and the consumer can use that at (or construct a link to) the designated well-known product information (prodinfo) host.

This product information resource will produce a web page when accessed by the consumer using a browser and to other agents present an information resource to allow repurposing of data.  To allow even easier access, a product or a product’s commercial website can also include a two-dimensional barcode, such as QR Code or Data Matrix, which can enable URI redirection to the information asset on a suitable mobile device.  An advanced system would use any Near Field Communication device such as embedded RFID tags, though there are, of course, privacy concerns.

The keys or claims here are:

  1. The requirement to have a standard URI to a usable authoritative third party standard information resource.  This allows centralization and repurposing of information.  This also ensures that the database is complete, unbiased, and durable.
  2. The requirement to have this number attached to the product in the form of a URI or identifier to be used to create a URL.
  3. The use of an information source and access point independent of trade item producer, thereby eliminating conflict of interest and other drawbacks.
  4. A focus on consumer use and secondarily on extended application reuse if possible.

The structure of the URI is secondary though important.  Existing codes such as MPM, ISBN, EAN/UCC-13, UPC, GTIN, GS1 Identification Numbers, and others are usable.  A code should be non-significant, i.e. not encode identifying attributes, instead a code is assigned thru an authority agent.  Also existing organizations that supply applicable standards, for example, GS1, International Association of Bedding and Furniture Law Officials (IABFLO), and others can provide input into the requirements of this proposal.  What may be different is that the focus is on consumer use not commerce per se or to promote consumerism.

3. Advantages


3.1 Reduced Searching

Just reducing the amount of searching for product information on the web is an advantage.  More time can then be devoted to understanding and using that information.  This is not to imply product related searching will not occur.  There is still great value to use search engines, retail sites, reviews, tweets, and blogs.

3.2 Product Identification

Since we have a URI for a product, other sources are even more valuable since they can refer to an exact well-known ‘thing’.  This is how one can use Wikipedia now, to provide a URI on a subject.  Note that there is much related research in this area, with proposals such as “Published Subject Indicators” in the Topic Map standards effort.

3.3 Reuse

A resource with a known location can be accessed by external systems to provide value add. For example, using RESTful web services. Now one can use Crowd Sourcing and other Social Applications with authoritative information not just relying on advertising.

4. Product Range

To be truly usable, information must be available for all products.  In practice that may not be possible.  Yet, if something was manufactured, assembled, grown, or discovered, there must be information on it somewhere.  So, it should be possible to import and centralize that information.

Services are also important.  Someone contemplating a treatment at a suntan parlor should be able to find usable information and not have to scour the whole web for it.

[todo: more to say here]

5. Prod Info standards


[todo:]  In the B2B, EDI, ebXML, CRM, and other realms there are already many schema in use or proposed that incorporate product information.

6. Information Content


What should be on the information page?   First, there should be enough information to allow the consumer to decide on a possible action.  For example, if one is viewing information on a drug there should be information related to applicability, purchase, and use, such as that which typically accompany prescriptions purchased at a pharmacy.  But, on the web, information content can be aggregated from multiple sources.  In addition, links to more detailed, arcane, or expert level data can be made available.  Clearly, the content will depend on product type with most product pages containing a common base

Much less clear is if there should also be content or links to information that are negative or a contraindication.   Should a product page for a weight loss pill also indicate that the pill has not been proven to effect weight loss, it’s just snake oil?  Should a page on a ladder state that the ladder is flimsy and already killed many homeowners?  Should we state that a shampoo does nothing to enhance a color treatment?

Negative or statistical information could be presented in decreasing order of importance.  An analogy with web based maps which allow zooming is that an prodinfo page could allow a zooming of information levels.  A good example is how on the Slashdot page (http://slashdot.org) there are widgets that allow the user to vary the amount of postings that are hidden by depth and so forth.

Many issues must be addressed such as the usual web security privacy concerns and what features to support:

  • Trade secrets:  Can a company refuse to divulge standard information?
  • Liabilities:   Who owns the information?
  • Images:   Should images be included?
  • Feedback:  Should customer feedback be allowed?  Funneled to company?
  • Misinformation:  How are corrections made?
  • Privacy:  Should user access be tracked?
  • Content ratings:  should sex toys, for example, be viewable by default?

7. Information Creation


There are two major subsystems envisioned:
1. ProdInfo data: the raw datasets and references supplied by product creator
2. ProdInfo Information: structured information based on the prodinfo data and external source and links.

Product/service creators will in addition to assigning (or accepting) a product identifier will also be required to supply domain specific data for import into the proposed product data store. This will unfortunately be a burden with complications to their existing Product Information Management system or set up of an actual system or procedures in order to supply the prodinfo data. This could be a B2B type of transaction.

The data is stored in both structured and unstructured format datasets, for example like that found at data.gov. Additional information and data will also be aggregated or linked to when creating the data content. Thus, information from the FDA, FTC, HHA, EPA, NIH and others could be retrieved or linked to.

This data will then be accessed by product specialist editor personnel using a content management system to produce prodinfo information for storage.

The prodinfo is made available by dynamic web sources suitable for end-user (consumer) viewing. It is the final processing using both automated and human input that transforms the raw data into information that can be rendered to present a high signal to noise ratio.

8. Information Representation


Having information is a small part of the problem.   What is difficult is making it usable.  In terms of consumer view, this is a job of information architects or designers in conjunction with usability experts and visual modeling solutions.  The application must from the start be user-centered design.

Since the intended audience is diverse, the information presentation should be customizable, allowing, for example, the selection of tables vs. graphics.  There are many more visualizations available.  A great example, that is also interactive is the “Snake Oil? Scientific evidence for popular health supplements” found at http://www.informationisbeautiful.net/play/snake-oil-supplements/.

9. Funding


9.1 Advertising

Advertising is not evil.  It is a prime means for consumers to be apprised of products and it has been a prime source of funding many “free” services like television, radio, and now many internet based services and resources.

Allowing a ProdInfo presentation to include advertising may be a funding option.  However, there is a negative aspect of advertising when it becomes a means to foster consumerism using pyschosocial manipulation such as sexual titillation and over the top in your face loud info-screams.  Would unobtrusive advertising on ProdInfo pages lead to future models selling sport cars?

A middle ground of advertising can be used where the consumer of the ProdInfo can choose to view a company’s advert on the page.  If not chosen, no vestige of the ad would be shown.

One other question is whether ads would be restricted to the product’s manufacturer or service provider or to all companies?

9.2 Sponsoring

Companies and people can take part in their community by various programs that allow one to adopt a highway, park, street, and other things.  Similarly, since ProdInfo benefits all, perhaps one can also adopt a category or some other part.

9.3 Fees

A company may choose to pay to receive information on how their product is viewed, such as page hits, feedback, and browser behavior.  A way of even sending back user feedback can be implemented.

10. Organization


10.1 Government hosting

Having this hosted and managed by a government agency seems the most appropriate.  The U.S. Consumer Product Safety Commission seems like a natural fit.  However, this commission seems more focused on safety issues not on general information dissemination.  The Food and Drug Administration is also relevant.  But, again is limited and it is also not focused on general information.  Some products are not even tracked, for example:

Where can I get information about a specific dietary supplement?

Manufacturers and distributors do not need FDA approval to sell their dietary supplements. …  FDA does not keep a list of manufacturers, distributors or the dietary supplement products they sell. If you want more detailed information … contact the manufacturer of that brand directly. …

Another possible overseer of this information is the Federal Trade Commission’s Bureau of Consumer Protection.  However, currently it does not have a division whose aim is general product information, especially the reuse of that information for further consumer and business applications.

The most likely agent in the United States would be the U.S. Department of Commerce.  They already have great initiatives like data.gov.  If not perhaps a new independent agency of the federal government can be created to oversee this, such as the Product and Services Information Agency.  This agency will have to interoperate with its corresponding international peers since products and services span national boundaries.

10.2 Non-profit

Wikipedia, Apache Foundation, Eclipse, and other successful projects have shown that alternative funding and organization can work very well.  ProdInfo can be run in one of these modes.

10.3 Commercial

Many commercial entities especially in the Social Networking business already have the knowledge and resources to host the ProdInfo.  Some, like Google already have their own similar database to support their advertising monetization efforts.

Thus, a Yahoo, Microsoft, Google, FaceBook, MySpace, AOL, EBay, or Amazon can do this.  But, they shouldn’t.

11. Requirements


  • Standards based
    • W3C and/or ISO technologies (RDF, Topic Maps, Linked Data)
    • REST
    • Secure
    • Privacy protection
    • Syndication, so consumers can get updates on what they use or own.
  • Non-profit
  • No ads or promotions
  • Non-partisan
  • Lean and uncluttered: must be usable from mobile devices
  • Information depth hierarchy
    • For consumer use normal non-technical prose will be norm.
    • Ability to see more technical information
    • Ability to get expert level information
    • Ability to query for data by agents
    • Ability to view information at higher category levels.
  • Services and tools (see data.gov for examples)
    • To aid actual use of the information
    • Embedded agents, to avoid search
  • Important information always present and syndicated by RSS
    • Product recalls
    • Warnings
    • Poison control and contact links
  • Documentation
    • Product Manuals (local preferred instead of links to them)
    • Specifications
    • Parts and links to BOM
    • Health label
  • Non-consumer information
    • Material Safety Data Sheet
    • Compliance
    • Expert and technical data
  • References
    • Information resources
      • Dictionary
      • Glossary
      • Encyclopedia
  • Comparative products
  • Agencies, Trade groups
  • Laws
  • Manufacturer references

12. Related Efforts


Searching for the term “consumer product information” will show many existing efforts to provide such a service.  The cpid project at http://www.whatsinproducts.com/index.php, offers a glimpse of one possible format.  However, since the products selected are based on sales, it is incomplete. Interestingly, a National Institute of Health site, http://householdproducts.nlm.nih.gov/about.html, uses the aforementioned cpid database.

More relevant is the ICEcat.biz service that offers an Open Catalogue of product content. However, it is not consumer focused. Perhaps its data could be re purposed to a work with the proposed system. [todo: more research needed].

There are also various Patents on similar systems.  Most recently USA Patent 6064979[1] has claims regarding the search of a database for a manufacturer identification numbers (MIN) accessing a stored URL and allowing the consumer to navigate to a manufacturer’s product site or if no matching MIN is found to the manufacturer’s home page.  Clearly there is “prior art”, and unlike the present proposal, it is not based on an independent repository of product information.

There are many independent web sites that provide product information.  However, the web is adversarial in the sense that there is competition to gain site hits and thus ad revenue.  So, a large proportion of product specific pages are actually counterproductive.

The concept of quick access to web based information by lookup on a product or advertisement using a barcode or ID is not new of course.  The CueCat product, though a commercial failure, brought forth the unique, at the time, concept of an URL for product information (really for sales).  Most recently the Stickybits, http://www.stickybits.com/, startup uses barcodes for a social network application about stuff.

Another approach is to improve product information search itself using many different techniques such as federated search using agents.

[todo: about Conventional search engine improvements]

Search engines are rapidly evolving to use more information sources and combine the resulting search results to be more user-centric.  For example, searching for “gum” in Google results in a their standard search results page.  But, one can click on the “Shopping” link at the menu line and be presented with the results via Google Products.  Here one can drill down by category, price, brands, and so forth.  Presumably this is a result of a database on company and partner provided metadata.

[todo: about Research on federated search] [todo: about Agents] [todo: about Semantic Web]

Linked data and standard localized ontologies could supply the required metadata to go beyond simple text search.  This is  one of the benefits of the future Semantic Web.

[todo: about Social networks providing prodinfo]

13. Conclusion


Presented was a proposal of requiring that manufacturers and service providers furnish information that will be aggregated and repurposed.  The information will be assigned a unique URI based on the assigned unique product id.  This raises consumer IQ and allows for efficient and productive use of internet resources.

14. Links


“ISO starts work on safety standard for consumer products”, http://www.iso.org/iso/pressrelease.htm?refid=Ref1268

ICEcat.biz, http://www.openicecat.com/us/menu/services/index.htm

Open Catalogue, http://en.wikipedia.org/wiki/Open_catalogue

Thomas J. Perkowski. Patent 6064979, “Method of and system for finding and serving consumer product related information over the internet using manufacturer identification numbers”, 16 May 2000, http://www.google.com/patents/about?id=sEMEAAAAEBAJ, accessed 22 Mar 2010.

Boutin, Greg; “The Next Web to be User-Centric (Thoughts on David Siegel’s Pull Book)”, 18 Jan 2010, http://www.semanticsincorporated.com/2010/01/pulling-together-the-semantic-web-.html

“Product Information Management”, http://en.wikipedia.org/wiki/Product_information_management

Linked Data, http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

DBpedia, http://dbpedia.org/About

Document Object Identifier (DOI), http://en.wikipedia.org/wiki/Digital_object_identifier





OASIS Published Subjects TC, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubj

GS1, http://www.gs1us.org/

Semantic Web: http://www.w3.org/2001/sw/

Oracle Product Hub, http://www.oracle.com/master-data-management/pim_data_hub.html

Stock-keeping unit, http://en.wikipedia.org/wiki/Stock_Keeping_Unit

Part Number, http://en.wikipedia.org/wiki/Part_number

Uniform Resource Identifier, http://en.wikipedia.org/wiki/URI

UPC Database, http://www.upcdatabase.com/

SameAs:  http://sameas.org/

UPC Frequently Asked Questions, http://www.upcdatabase.com/docs/faq.asp

Glossary, http://www.gs1us.org/Glossary/tabid/58/Default.aspx

Google Base, http://base.google.com/

Google Merchant Center, http://www.google.com/merchants/?hl=en

Data.com:  http://www.data.gov/

CueCat:  http://en.wikipedia.org/wiki/CueCat

“Web inventor calls for government data transparency”, http://news.bbc.co.uk/2/hi/technology/8572809.stm.  By Chris Vallance, BBC News

“Snake Oil? Scientific evidence for popular health supplements”,  http://www.informationisbeautiful.net/play/snake-oil-supplements/

Consumer Products Information Database, “Health Effect of Household Products”, http://www.whatsinproducts.com/index.php

“Data Matrix Barcode ISO/IEC 16022 FAQ”, http://www.idautomation.com/datamatrixfaq.html

i-nigma qr datamatrix barcode reader, http://itunes.apple.com/us/app/i-nigma-qr-datamatrix-barcode/id331895424?mt=8

Nano-based RFID tags could replace bar codes, http://www.eurekalert.org/pub_releases/2010-03/ru-nrt031810.php

All rights reserved. No part of this document may be reproduced or transmitted in any form by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from Josef Betancourt.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License,  See: http://creativecommons.org/licenses/by-nc-nd/3.0/

[1] I just stumbled onto this patent during a web search.  Somehow I did not think something like this could be patented.

