Application logging using unique id

For programs that write to log files a best practice is to include a tracking ID. What should this ID be and how to use it? The following is presented as a ‘design pattern’.

TL;DR

At the earliest inception of a ‘process’ create an ID which is a combination of an ‘origin id’ and ‘unique id’. This non-business related global inception ID (GIID), should be used for every log output to provide machine readability and tool use. When each new business level or operational id is obtained or generated, it is logged to provide an ‘association’ with this GIID. This is similar to various existing practices so it is presented as a Software Design Pattern.

Author: Josef Betancourt, Jan 11, 2015

CR Categories: H.3.4 [Systems and Software]; D1.3 [Concurrent Programming]; D.2.9 [Management]; K.6 [MANAGEMENT OF COMPUTING AND INFORMATION SYSTEMS]; K.6.3 [Software Management]


  1. Context
  2. Forces
  3. Solution
  4. Consequence
  5. Implementation
  6. Appendix
  7. Further reading
  8.  

Context

App Logs

An application log file records runtime programmatic events: details, exceptions, debug, and performance related measures. This is different than specialized server log files, such as a webserver’s access logs, error.log, etc. The latter are more standardized, for example with W3C Extended Log File Format, and well supported.

App logs usually capture details at specific logging levels and named contexts. In the Java ecosystem there are plenty of libraries to support this and now the JDK supports this as part of the java.util.logging package.

Despite the advances made in logging APIs and support within operating systems and frameworks, app logs are at a primitive level of software maturity. What A. Chuvakin and G. Peterson describe as the “… horrific world of application logging” is composed of ad hoc, minimally documented, unstructured, untested, and under-specified delivered components.

Attempts to create widely used industry standards have failed and every business, software project, dev team, or industry reinvents and attempt to tackle the same problems.
 

Forces

In the context of server app logs, multiple sessions will output log information that can be intermixed. These sessions can be part of an ongoing process, such as a user interaction with a web site.

External integration points (web services, database, etc) may also be invoked. Unless each session is identified in the log and integration logs, subsequent support, debug, and auditing are very difficult.

The problem is not just tracking where and when ‘integration’ occurred or its non-functional integration criteria (i.e., timing), but the tracking of subsequent logging, if any, at that location.

App logs are used extensively during development. Their importance is illustrated by an old mime “debuggers are for wimps”. As such, logs with impeccable tracking used for design and test are a good Return On Investment (ROI).

The same is true for deployed systems. In many cases the only information available on historical behavior is in a log file.

This seems like a programming 101 lesson, but it is widely disregarded in practice. That log file output is a minor concern and sometimes not even part of a “code review” is puzzling.

Scenarios

1. A service must invoke a distributed call to another system. The service has retry logic, and logs each failure. If each log output does not identify the session or operation, the retries could get mixed with other threads. Identifying an end user’s request is a hit or miss bracketing of time stamps if the development team did not enough identifiable data in each log output.

2. A computer savvy end user or family may attempt to register into your system with multiple browsers simultaneously. This could cause problems if multiple logins are supported and an error occurs. How do you track this and debug it?

3. The app server makes a remote call to a service integration point and that service fails. How is the owner of that service informed as to the specific invocation? There are probably deployed systems where one would have to compare time stamps on log output to even coordinate where the two systems communicated and even then it is vague. Some systems may not even do any logging and the unless there is a fault of some kind.

4. You have to identify time periods based on hazy user complaints, search through multiple log files with regular expressions, then walk each output to recreate a specific error scenario. Isn’t this manual drudgery what computers were supposed to eliminate?
 

Solution

Global Inception ID

Logging with Unique identifiers is encouraged as a best practice:

“Unique identifiers such as transaction IDs and user IDs are tremendously helpful when debugging, and even more helpful when you are gathering analytics. Unique IDs can point you to the exact transaction. Without them, you might only have a time range to use. When possible, carry these IDs through multiple touch points and avoid changing the format of these IDs between modules. That way, you can track transactions through the system and follow them across machines, networks, and services.” — http://dev.splunk.com/view/logging-best-practices/SP-CAAADP6
 

This unique identifier is generalized so that on first entry into a system or the start of a process, A Global Inception ID (GIID), is assigned to distinguish that ongoing process from others. A more descriptive term would be a Global Tracking ID, but that conjures up privacy and security concerns and is already being used in commerce for a different purpose. But ‘inception ID’ brings up visions of barcodes on people’s foreheads. Ok, how about “bludzwknxxkjysjkajskjjj”?

The term “Global” is to indicate that this ID is unique in a specific enterprise system. The uniqueness comes from its creation on a first contact basis on a specific subsystem. In essence this is a log tracking ID.

For example, a web server or an app server would be the first point of contact or request from an external User. The GIID, consisting of a combination of origin id and a unique id, would be created at this point. GIID ::= originID uniqueID

In article “Write Logs for Machines, use JSON” Paul Querna uses the term “txnId” for this type of ID:

“… this is essentially a unique identifier that follows a request across all our of services. When a User hits our API we generate a new txnId and attach it to our request object. Any requests to a backend service also include the txnId. This means you can clearly see how a web request is tied to multiple backend service requests, or what frontend request caused a specific Cassandra query.”
 

Another term for this GIID, or ‘txnId’ is Correlation ID. This terminology is used in SharePoint.

 
The correlation ID is a GUID (globally unique identifier) that is automatically generated for every request that the SharePoint web server receives.

Basically, it is used to identify your particular request, and it persists throughout further communications, if any, between the other servers in the farm. Technically, this correlation ID is visible at every level in the farm, even at a SQL profiler level and possibly on a separate farm from which your SharePoint site consumes federated services. So for example, if your request needs to fetch some information from an application server (say, if you are using the web client to edit an Excel spreadsheet), then all the other operations that occur will be linked to your original request via this unique correlation ID, so you can trace it to see where the failure or error occurred, and get something more specific than “unknown error”. — https://support.office.com/en-nz/article/SharePoint-2010-Correlation-ID-in-error-messages-what-it-is-and-how-to-use-it-5bf2dba7-43d2-484c-8ef4-e059f76e3efa

 

Contextualized IDs
Another related term is called “Contextualized IDs”, by Michael Nygard:
“… it’s important that IDs carry along their context. It isn’t enough to have an alphanumeric Policy ID field. You need a URN or URI to identify which policy system issued that policy number.”
— Micheal Nygard, ‘Inverted Ownership, http://www.michaelnygard.com/blog/2015/05/inverted-ownership/

Various ‘Structured Logging’ efforts or syslog implementations already contain a ‘sending’ field specification. The GIID incorporates the sender id as the Origin ID, and this combination is more amendable to human and textual tools parsing.

 

Consequence

Size

A good candidate for a GIID must be large enough to satisfy uniqueness requirements. This could be, for example, a 36 character field. Where the log files are manually inspected with a text editor, this increases the log line which already contains many common fields like a time stamp.

Security

Unintentionally, “bad” logging practices makes it harder to track and correlate personally identifiable information (PIN). With the use the trans-system GIID, correlation between various business related identifiers is made easier.

The correlation ID is not necessarily a secret, but like other tracking objects like cookies, can be used for information discovery or questionable information storage. But, if an attack can already access your log files, there are other more serious issues?

Redundancy

What determines the first contact subsystem? A true distributed system could be configured or dynamically configured so that any system could be the first contact system. If so, then each subsystem is creating GIID and passing that GIID to other systems that are themselves creating GIIDs.

One approach to handle this is that a system will only create a GIID if none is present in the incoming request.

Feedback

For user interfaces, exposing the GIID or parts of it in exception situations can be beneficial:

“We also send the txnId to our user’s in our 500 error messages and the X-Response-Idheader, so if a user reports an issue, we can quickly see all of the related log entries.” — https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
 

Compare this to the Hunt The Wampus adventure in enterprises that only have an approximate time of an issue and must track this over multiple systems.

Accuracy

If a giid is part of a support system and as above the ID would be shared with Users in some way, would the value need some form of validity testing? Should it be tested that it is wellformed and include a checksum?

Example crc calculation for a UUID, based on textual version of id:

groovy -e "java.util.zip.Adler32 crc = new java.util.zip.Adler32(); crc.update(UUID.randomUUID().toString().getBytes());println Long.toHexString(crc.getValue())"
af9c09a3

 

Implementation

Origin ID

An OID uniquely identifies a system in an enterprise. This could be a web server or messaging system. Using a code for the actual system is recommended. Thus, instead of Acctsys, it would be a code, PE65K for example. Using a code is more ‘durable’ than a human readable name.

An extension is to also encode other information in the origin ID, such as application or subsystem identifiers.

This could even reuse various ‘naming’ standards, as found, for example in Directory services such as LDAP.

Unique ID

This ID must not be a business related entity such as user id or account number. The simple reason is that these may occur in the logging record multiple times for different sessions or transactions. For example, user Jean Valjean with account number 24601 may log in multiple times into a web site. Tracking a specific session if a problem occurs is easier if we use a unique ID.

A business level ID may not even be relevant in another system that interacts with the origin point. In one system the ID could be one type of ID, and in the other the same party or user could be identified with a different ID.

Note that as soon as determined, accessed, or generated, a business level ID should be associated with the GIID. This could be a simple log output of that business ID which, since every log output has a GIID, will associate the business ID or value with the GIID.

Similarly when the same process communicates with another system, that system’s unique identifiers and related business IDs will also be associated with the GIID. For example, a web service will take the GIID and relate it to its own ID(s). Now a support engineer can follow the complete path of the system response to a user session.

ID creation

The easiest approach is to use the entry system’s session id created by the server. A potential problem is that this session id is not guaranteed to be unique and ends when the session expires. A UUID solves most problems.

Sample UUID generation in Groovy language:

groovy -e "println UUID.randomUUID().toString().replace('-','')"
1f788da1ac4a43bb82adb8e61cfcb205 

If the system ID is 3491 then the above UUID is used to create the GIID and use in logging:

20110227T23:34:37,901; EV={_ID:”34911f788da1ac4a43bb82adb8e61cfcb205″, USERID:”felixthecat”, ….}

Alternative to UUID use?

A UUID is a 32 character string. Could something smaller be used? Perhaps, but eventually the complexity of threaded systems would make the uniqueness constraint of any ID approach a comparable length.

Other approaches are possible. Most of these will incorporate a timestamp in some way. Note that a UUID contains a timestamp.

An example of a ‘unique’ id is used by MongoDB’s ObjectID specification. That spec calls for a 12-byte BSON type of:
• a 4-byte value representing the seconds since the Unix epoch,
• a 3-byte machine identifier,
• a 2-byte process id, and
• a 3-byte counter, starting with a random value.
An example of an ObjectID string representation is ObjectId(“507f1f77bcf86cd799439011”)

Log Framework support for GIID

The use of a GIID is a ‘cross-cutting’ concern. Requiring programmatic use of this ID would be burdensome and error-prone, even if stored in a thread-safe context.

Some logging frameworks support the concept of “nested diagnostic contexts”. This is a way of storing an ID so that interleaved logging is properly identified. See http://wiki.apache.org/logging-log4j/NDCvsMDC for more information.

Example usage

In a conventional Java server application a JSP or template system would obtain a GIID and insert it into generated pages and associated client side scripts. That GIID would also be stored in the server side session. Since the GIID is stored at the session it is accessible to the various services and components on a server.

This ID is embedded in request to other distributed servers and provides event correlation. Thus the logging systems will have access to the GIID and Client or server side errors can then display or use the GIID for tracking and reporting to assist support engineers.

Since the client also has the GIID, it can display or use this ID for customer service processing.

Of course, this would make more sense if it is a part of a wider Application Service Management (ASM) system.

Standards for IDs

Though many standards specify ID handling, modern architectures, especially web based or distributed, emphasize a stateless protocol. A GIID requirement could be one of those legerdemain stateful practices.

Development impacts

If logging is a deployed feature of an application then it too needs testing. But, since log output is an integration point, it does not fall under the “unit” testing umbrella. There is even some doubt if this should even be tested! Here is one example: Unit Testing: Logging and Dependency Injection
If log files can contain security flaws, convey data, impact support, and impair performance, then they should be tested that they conform to standards. Project management spreadsheets needs to add rows for logging concerns.

Technology

Log output can be developer tested using the appropriate XUnit framework, like JUnit.
Mocking frameworks provide a means of avoiding actually sending the output of a logger to an external ‘appender’. “Use JMockit to Unit test logging output”.
Issues
In development of a project, the log output changes rapidly as the code changes. Selecting where in the software development life cycle (SDLC) to test logging or even specify what logs should contain is difficult.
One approach is that the deployed system will not do any app logging that was not approved by the stake holders. These must be “unit” tested, and all development support logging is removed or disabled except for use in a development environment.

Deployment

There is no need to change every subsystem to use this log tracking approach. If the GIID is created somewhere in the “path” of a process or service, it adds value. Other systems can gradually start to use a tracking ID. Thus, the tools and training to use this capability can also be progressively introduced.

About this post

I was going to title this article ‘Logging in Anger’, as a response to my own experiences with application logging. Alas, there are so many issues that I had time to only focus on one as a result of a recent stint supporting an application that exhibits the typical logging anti-patterns. Example: it’s bad to get a null pointer exception, but to not know which argument to a function caused this?
 

Appendix

Structured Logging

(this article was going to add more info on incorporating a GIID into a Structured Logging framework. This section is here for refernce)
Structured Logging is a type of app log file that is data based rather than prose based. Thus, it is machine readable and amendable to high-level tools, not just a text editor.

Treating logs as data gives us greater insight into the operational activity of the systems we build. Structured logging, which is using a consistent, predetermined message format containing semantic information, builds on this technique …. We recommend adopting structured logging because the benefits outweigh the minimal effort involved and the practice is becoming the default standard. — http://www.thoughtworks.com/radar/techniques/structured-logging

 
An example, is a system where the log output uses a predetermined message format. An overview of such systems is found in chapter 5 of “Common Event Expression”, http://cee.mitre.org/docs/Common_Event_Expression_White_Paper_June_2008.pdf

Note this should not be confused with a similar sounding technology called “Log-structured file system”.
 

Further reading

  1. ivot Tracing: Dynamic Causal Monitoring for Distributed Systems
  2. Dapper, A Large Scale Distributed Systems Tracing Infrastructure
  3. Log management and intelligence, http://en.wikipedia.org/wiki/Log_management_and_intelligence
  4. Logging a global ID in multiple components, http://stackoverflow.com/questions/1701493/logging-a-global-id-in-multiple-components
  5. Application Service Management (APM) system
  6. Application performance management, http://en.wikipedia.org/wiki/Application_performance_management
  7. The art of application logging, http://www.victor-gartvich.com/2012/05/art-of-application-logging.html
  8. Patterns For Logging Diagnostic Messages, http://c2.com/cgi/wiki?PatternsForLoggingDiagnosticMessages
  9. UUID, UUID
  10. How to test valid UUID/GUID?
  11. Log Data as a Valuable Tool in the DevOps Lifecycle (and Beyond), http://devops.com/features/log-data-valuable-tool-devops-lifecycle-beyond/
  12. OWASP – Logging Cheat Sheet, https://www.owasp.org/index.php/Logging_Cheat_Sheet
  13. How to Do Application Logging Right, http://arctecgroup.net/pdf/howtoapplogging.pdf
  14. Request for comment Structured Logging, http://www.mediawiki.org/wiki/Requests_for_comment/Structured_logging
  15. 6 – Logging What You Mean: Using the Semantic Logging Application Block, http://msdn.microsoft.com/en-us/library/dn440729(v=pandp.60).aspx
  16. A Review of Event Formats as Enablers of event-driven BPM, http://udoo.uni-muenster.de/downloads/publications/2526.pdf
  17. Basic Android Debugging with Logs, http://www.androiddesignpatterns.com/2012/05/intro-to-android-debug-logging.html
  18. Mapped diagnostic context vs Nested diagnostic context, http://wiki.apache.org/logging-log4j/NDCvsMDC
  19. Building Secure Applications: Consistent Logging, http://www.symantec.com/connect/articles/building-secure-applications-consistent-logging
  20. Log for machines in JSON, https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
  21. Logging Discussion, http://c2.com/cgi/wiki?LoggingDiscussion
  22. CEE, http://cee.mitre.org/docs/Common_Event_Expression_White_Paper_June_2008.pdf
  23. CEE is a Failure., https://gist.github.com/jordansissel/1983121
  24. Centralized Logging Architecture, http://jasonwilder.com/blog/2013/07/16/centralized-logging-architecture/
  25. Centralized Logging, http://jasonwilder.com/blog/2012/01/03/centralized-logging/
  26. Logging and the utility of message patterns, http://calliopesounds.blogspot.com/2014/07/the-power-of-javatextmessageformat.html?m=1
  27. Payment Application Data Security Standard, https://www.pcisecuritystandards.org/documents/pa-dss_v2.pdf
    Payment application must facilitate centralized logging.
    Note: Examples of this functionality may include, but are not limited to:
    • Logging via industry standard log file mechanisms such as Common Log File System (CLFS), yslog, delimited text, etc.
    • Providing functionality and documentation to convert the application’s proprietary log format into industry standard log formats suitable for prompt, centralized logging.
    Aligns with PCI DSS Requirement 10.5.3
  28. NIST 800-92 “Guide to Computer Security Log Management”, http://csrc.nist.gov/publications/nistpubs/800-92/SP800-92.pdf
  29. UniqueID Generator: A Pattern for Primary Key Generation, http://java.sys-con.com/node/36169
  30. java.util.logging, http://docs.oracle.com/javase/7/docs/api/java/util/logging/package-summary.html

Similar Posts:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

One thought on “Application logging using unique id”

Leave a Reply

Your email address will not be published. Required fields are marked *