Talk:Globally unique identifier/Archive 1
This is an archive of past discussions about Globally unique identifier. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 |
Untitled
Is anyone familiar with the term "ProgID" to mean a text alias for a CLSID? It is used in developing COM objects for ArcGIS. If this term is in common use, we ought to add it.
I'm changing "mathematically guaranteed" to "statistically guaranteed". Statistically, a 1 in 2^128 chance may seem infinitesimal. Mathematically, it is no closer to infinitesimal than, say, 1 in 4. Admittedly, the phrasing here is a matter of style, or personal preference. (Oops, and it looks like I wasn't logged in when I did the edit. Sorry about that.) --DavidConrad 03:45, 23 Jun 2005 (UTC)
- ProgID is described quite well here: [1] DRead 23:42, 24 October 2006 (UTC)
- I beleive that the original (V1) algorithim contained several features to procedurally guarantee the uniqueness of a GUID, at least on this planet. Primary among these features was the use of a MAC address in the GUID, since a centralized authority is used to assign ranges of MAC addresses to equipment vendors. Pressure over the perceived privacy issues related to embedding a MAC address in a GUID lead Microsoft to change the default algorithim used for GUID generation to one which uses a random number in the last segment. This weakens the level of gaurantee of uniqueness to a purly statistical one. However, at the same time, Microsoft introduced an extended API allowing callers to generate GUIDs using the original algoritihm. Burt Harris 21:34, 14 January 2006 (UTC)
Not really 2128 unique numbers
If, as the Globally Unique Identifier#Algorithm chapter says, GUID reserves parts of its layout for versioning, then there cannot be 2128 unique numbers. --Abdull 14:39, 4 April 2007 (UTC)
- Oh, and one more: who or what is the mentioned OSF? --Abdull 14:52, 4 April 2007 (UTC)
- Both fixed. Shinobu 00:39, 27 June 2007 (UTC)
This is pretty poor.
- The definition of guid from guiddef.h is as shown below:
- [...]
- i.e. a concatenation of a 4-byte number, two 2-byte numbers and eight single bytes.
Except that this C code, where these sizes are only true on platforms where longs are 32-bit, shorts 16-bit, and chars 8-bit, none of which is guaranteed by the C standard at all.
- This text notation follows from the data structure defined above.
Except it doesn't, because Data4 is split in two in the text notation, while it's a single item in the data structure.
Not impressive. I'd fix it myself, but the page is asking for an expert, and that I am not. 81.86.133.45 15:40, 25 March 2007 (UTC)
- Fixed. The article was indeed rather lousy on this point. Shinobu 01:12, 27 June 2007 (UTC)
List of common GUID's?
Perhaps it would be nice to have a list of common GUID's in the article.
Like e.g.:
{00000000-0000-0000-C000-000000000046} IUnknown {00020400-0000-0000-C000-000000000046} IDispatch {00020401-0000-0000-C000-000000000046} ITypeInfo {00020402-0000-0000-C000-000000000046} ITypeLib {00020403-0000-0000-C000-000000000046} ITypeComp {00020404-0000-0000-C000-000000000046} IEnumVARIANT {00020405-0000-0000-C000-000000000046} ICreateTypeInfo {00020406-0000-0000-C000-000000000046} ICreateTypeLib {20D04FE0-3AEA-1069-A2D8-08002B30309D} My Computer {21EC2020-3AEA-1069-A2DD-08002B30309D} Control Panel (Windows)
Information like this is currently a pain to find. (Disclaimer: I haven't checked these, I just searched the web. You can also find some GUID's in the registry and in the Windows header files.) Of course there would be the question of what to include and what not to include, but I'm sure we can figure something out.
It would also be nice if 00000000-0000-0000-C000-000000000046 linked to IUnknown, but I'm not sure what Wikipedia policy says about that, so I won't make these redirects (unless prompted to do so at least). Bye, Shinobu 00:56, 26 June 2007 (UTC)
- On the face of it a GUID list/index seems like a good idea, with the caveat that the MS UUID spec could very will be completely different for Vista (MS not being big on backwards-compatibility and all). I'd advise sticking to only the very most common identifiers, otherwise that list will get unwieldy in a hurry. As for the redirect question... well, that begs a few additional questions. Could you create redirs for common GUID's, i.e., is it feasible? Sure: I've heard the refrain "redirects are cheap" parroted a few times by WP'ers. Should those redirs be created? I don't know. If "00000000-0000-0000-C000-000000000046" and the like are going to be common search terms, then absolutely. If not, then I could forsee complaints that a GUID/UUID is too trivial a thing to be creating a redir for (I suppose there's also an outside chance that an illegal number type copyright complaint could be lodged too). That's my two centidollars. Groupthink 22:27, 27 June 2007 (UTC)
All the GUID's in the Platform SDK (at least those that are part of the API) are necessarily public. It is likely that "illegal GUID's" exist, but I wasn't thinking about those. Note that some of the GUID's from the Platform SDK are used elsewhere on Wikipedia, and in WINE and ReactOS as well. Anyway, I could try to compile and categorize a list. If it's decided we want it, that is. Shinobu 09:15, 28 June 2007 (UTC)
- I fear that "public" and "illegal" aren't necessarily mutually exclusive. If I were to post the entire Java class library, for instance, I don't think Sun would be too happy with me. Let me reiterate though that I think that's at best a secondary concern: the primary questions IMHO are utility and maintainability. Groupthink 13:04, 28 June 2007 (UTC)
In this case however, they are. Because they are part of the API of Windows and various other programs, even non-Microsoft software, programmers need to be able to include these GUID's in their programs whithout Microsoft then dictating the licensing terms. When you just interface the API, by default what you program yourself is your own intellectual property, in exactly the same way that Microsoft cannot forbid you to use the function name GetDIBits and in the same way as programming for GNU/Linux cannot legally force you to make your program available under the GPL. If Microsoft had wanted to fix its licensing terms upon us, it should have clearly stated those, as indeed it did with the Windows Media SDK. Instead, Microsoft chose not to do so and indeed emphasises the open and cross-platform nature of COM. Note also that these numbers are not secret decryption keys or something similar, but identifiers. IID 00000000-0000-0000-C000-000000000046 is a synonym for the IUnknown interface. Furthermore note that your example is a bad one in two ways. 1) You're actually talking about Sun's code here, not about the API. Your scenario is much more akin to making illegal copies of Windows. 2) Sun actually won't mind, since they're going open source and free software all the way with Java. Of course, IANAL, and even if I were, the Wikipedia servers are in Florida, a place with very strange and convoluted laws, or so I'm told. Shinobu 08:46, 4 July 2007 (UTC)
- Let me re-reiterate that the primary concerns IMHO are utility and maintainability. As for the Sun example: I was talking about the API, not the code. The scenario I was worried about was, say for example, carbon-copying this page to Wikipedia and trying to hide behind "well it's open-source anyway" as an excuse. I still don't think you'd be absolved of licensing issues in that case. And let's remember that this is Microsoft we're talking about here. If they were to complain about a comprehensive list of GUID's, they might not have a leg to stand on, but they could conceivably make enough of a stink to cause problems. Groupthink 05:16, 5 July 2007 (UTC)
I disagree strongly with including a list like this. This is an encyclopedia article that is for discussing the facts of the topic with prose, not listing trivia. I'm sure some people might find this information useful, but an encyclopedia is not the place to go to for such highly specific and frankly largely uninteresting information. If it's hard to find on the web then it would be worth submitting somewhere more suitable, like a technical blog. Remy B 14:27, 2 September 2007 (UTC)
CLIDs
I've found the following list of CLIDs here. I don't think this is really trivia, but rather technical infomation about CLIDS/Windows.
{D20EA4E1-3957-11d2-A40B-0C5020524153} Administrative Tools {85BBD920-42A0-1069-A2E4-08002B30309D} Briefcase {21EC2020-3AEA-1069-A2DD-08002B30309D} Control Panel {D20EA4E1-3957-11d2-A40B-0C5020524152} Fonts {FF393560-C2A7-11CF-BFF4-444553540000} History {00020D75-0000-0000-C000-000000000046} Inbox {00028B00-0000-0000-C000-000000000046} Microsoft Network {20D04FE0-3AEA-1069-A2D8-08002B30309D} My Computer {450D8FBA-AD25-11D0-98A8-0800361B1103} My Documents {208D2C60-3AEA-1069-A2D7-08002B30309D} My Network Places {7007ACC7-3202-11D1-AAD2-00805FC1270E} Network Connections {2227A280-3AEA-1069-A2DE-08002B30309D} Printers and Faxes {645FF040-5081-101B-9F08-00AA002F954E} Recycle Bin {E211B736-43FD-11D1-9EFB-0000F8757FCD} Scanners and Cameras {D6277990-4C6A-11CF-8D87-00AA0060F5BF} Scheduled Tasks {7BD29E00-76C1-11CF-9DD0-00A0C9034933} Temporary Internet Files {BDEADF00-C265-11D0-BCED-00A0C90AB50F} Web Folders
Ideas? Fosnez 02:20, 14 September 2007 (UTC)
Text Encoding
Is 7QDBkvCA1+B9K/U0vrQx1A== actually the Base64 encoded value of {3F2504E0-4F89-11D3-9A0C-0305E82C3301} or is it just an example of the format...? —Preceding unsigned comment added by 206.170.173.254 (talk) 17:24, 6 November 2007 (UTC)
base64 encoding
The section about base64 encoding is wrong or at least misleading. I just spent half an hour trying to figure out how to get from the 37 characters of a UUID to 22. Encoding a UUID in base64 actually doesn't give 22 characters but 49. What happens is that some people apparently converts UUID to base64 then arbitrarily shorten them to 22 characters. See for example: http://blog.madskristensen.dk/post/A-shorter-and-URL-friendly-GUID.aspx
However I don't think there's any point keeping that in the article. Yes we can shorten a base64/UUID string to 22 characters or even to 10 or 2 if we want, but that only makes the UUID less and less likely to be unique.
So I suggest deleting this section about base64 encoding if nobody's against it.
It would be good to check the Ascii85 technique as well, as it doesn't seem right either. I tried converting the string in the article "5:$Hj:Pf\4RLB9%kU\Lj" and the result wasn't a UUID. —Preceding unsigned comment added by Laurent1979 (talk • contribs) 16:44, 20 October 2008 (UTC)
- You misunderstood a lot of things. First, characters and bytes are two distinct things. Second, to the computer, UUID is a 16-byte number, represented as a human-readable string of 32 hexadecimal characters, plus 4 dashes (36 characters). What you convert to Base64 is not the human-readable representation, but the raw 16-byte number. So, beople are not "arbitrarily shortening" anything. 16 bytes encoded as Base64 gives a 24-character string, 22 if you remove the padding. So the article is correct.
- --Juliano (T) 20:43, 20 October 2008 (UTC)
- Ok my bad - and many thanks for the explanation. Actually, it is quite useful and I wonder if it should be somehow integrated to the article? Maybe we could just say that the conversion is done by taking the 16-byte number and feeding it to a Base64 encoder? Laurent (talk) 10:59, 21 October 2008 (UTC)
- No problem. Perhaps it could be better clarified under the section Text encoding that the three encodings refer to different ways of representing the 16-byte number. But this article, and Universally Unique Identifier are begging for a merge, plus a complete overhaul. Information is duplicated in these two articles, this article is confusing and goes way beyond the definition of GUID into the internals of the Windows operating system (is this even encyclopedic?), while the other one has a useless list of implementations in many different programming languages... --Juliano (T) 17:08, 21 October 2008 (UTC)
- Totally agree. Someone has to merge and cleanup all this stuff, but with care, distinguishing what is standard from what is an implementation with specific features. Concerning text encoding of GUIDs, here is a short article, in which different encodings are being tried out: [2]IPonomarev (talk) 07:56, 23 October 2008 (UTC)
- No problem. Perhaps it could be better clarified under the section Text encoding that the three encodings refer to different ways of representing the 16-byte number. But this article, and Universally Unique Identifier are begging for a merge, plus a complete overhaul. Information is duplicated in these two articles, this article is confusing and goes way beyond the definition of GUID into the internals of the Windows operating system (is this even encyclopedic?), while the other one has a useless list of implementations in many different programming languages... --Juliano (T) 17:08, 21 October 2008 (UTC)
- Ok my bad - and many thanks for the explanation. Actually, it is quite useful and I wonder if it should be somehow integrated to the article? Maybe we could just say that the conversion is done by taking the 16-byte number and feeding it to a Base64 encoder? Laurent (talk) 10:59, 21 October 2008 (UTC)
how about a disambiguation article?
We have seen that GUIDs can be interpreted in the strict Microsoft-sense used in MSSQL and ActiveX/COM and that the term can also be used generally. We could disambiguate GUID (microsoft) and GUID (ITU). The wikipedia should reflect other sources as best as possible, if divergent sources exist out there, a disambiguation may reflect the discussion better than one GUID article that tries to squeeze all aspects in Leobard 13:59, 15 March 2007 (UTC)
GUIDs have been in use since at least 1983, long before Windoze. But the first ones I ran across were 128-bit. —Preceding unsigned comment added by 199.44.26.11 (talk) 02:18, 20 April 2009 (UTC)
Sequential Algorithms
I've marked the "performance issue when inserting records" claim as needing a citation, since it's been my understanding that the opposite is true (creating a hot spot where all records are inserted [at the end of a clustered PK] increases resource contention for that page/location in a database). ― Brianary (talk) 18:26, 6 July 2010 (UTC)
variant field spec looks wrong
"One to three of the most significant bits of the second byte in Data 4 define the type variant of the GUID:"
I believe this is supposed to say the *first* byte in Data 4 . The second byte would be octet 9, and RFC 4122 states: "The variant field consists of a variable number of the most significant bits of octet 8 of the UUID." —Preceding unsigned comment added by TransAMrit (talk • contribs) 22:32, 22 July 2010 (UTC)
Way too implementation specific
This article only vaguely defines GUID which is a concept with many potential implementations, and then dives right into a particular implementation (a 32 character string). This format prioritizes the memory footprint of the GUID, but other forms of GUIDs have other priorities. This article should discuss at a high level and then either link particular implementations of have an implementations section. See http://bioimages.vanderbilt.edu/pages/guid.htm for a completely different set of priorities surrounding GUIDs. (size of the GUID is more or less the last thing that they care about). And that page is the #8 hit when I search for GUID in google... so it's got to be referenced in a bunch of places on the web. —Preceding unsigned comment added by 63.117.239.130 (talk) 13:05, 7 April 2011 (UTC)
Case Sensitivity
Someone referenced this page to me. As part of their explanation, they rationalized that since the values were based on hexadecimal digits, they were case sensitive. I don't know how that connection was made from reading the text of this WIKI, but an explanation of the significance, if any, of case in the GUID would be helpful. Possibly along with a side note that hexadecimal values are not case sensitive.
Thanks, non_ame@yahoo.com
66.193.125.254 (talk) 16:46, 18 January 2008 (UTC)
- Hexadecimal digits are case insensitive by definition of the term. Hexadecimal digits are simply extending the well known set of decimal digits 0,1,2,3,4,5,6,7,8 and 9 with the first 6 letters A, B, C, D, E and F (or a, b, c, d, e and f) meaning ten, eleven, twelve, thirteen, fourteen and fifteen. 77.215.46.17 (talk) 23:53, 18 April 2011 (UTC)
why Globally Unique Identifier?
Why is this at Globally Unique Identifier and not Globally unique identifier, with a redirection from the latter to the former? I just fail to see how this is a proper noun, or any other reason for that particular capitalization. —Preceding unsigned comment added by Rchandra (talk • contribs) 12:40, 4 June 2010 (UTC)
- Because it is the expansion of an acronym G U ID, but writing Globally Unique IDentifier to make it even clearer just looks too silly. 77.215.46.17 (talk) 23:56, 18 April 2011 (UTC)
Doubt that this article has a NPOV. Incorrect content! Pls. review!
Sorry... but this article is just a piece of... RNGs or better said PRNGs do not produce 128bit numbers by default. The numbers produced depend directly on a seeded value. If the seed value is the same a PRNG will produce the same sequence of numbers. If the seed value is not of 128bit by default or ... then there will be some problems.
Just a subset of all possible GUIDs will ever be produced. There will be a HIGH rate of collisions. The probability of a collision is much higher than assumed in this article with the described procedure to produce the GUIDs.
The term GUID is not just used in the Microsoft world. Microsoft is not the standardization commitee for the GUIDs.
GUIDs are not 128Bit by default. Doing a simple look up in Google I easily found a lot of websites refering to GUID of length 32bit, 16bit, 160bit, too. If this article intends to describe Microsoft GUIDs (if some clean defacto standard (not de jure)like that exists) then we should do a clean disambiguation.
It's a misleading and incorrect article stressing the Microsoft Cooperation in that amount that it might be more advertisement than explanation what a GUID really is and what might be an optimal design for a GUID.
Please review that article and make sure the NPOV is given!
--212.114.211.8 09:18, 10 August 2006 (UTC)
- I believe that perhaps this article should be merged into Universally Unique Identifier. The term GUID (versus UUID) was from the best I could ever discover a Microsoft-invented term. They had taken several basic ideas from already-existing OSF DCE (including the UUID) and in typical fashion given it a new name when they created their own re-implementation of it. Until the last few years the term GUID was almost exclusively in reference to Microsoft usage, and UUID was the term used pretty much elsewere. However, other than perhaps specific generation algorithms and terminology, a UUID and GUID are basically the same thing and the terms are now used (or misused) almost interchangably now, and should probably be discussed within the same article. In fact the UUID is really sort of a super-set of the GUID. The earliest mention of both together is found in the Leach draft from 1998 (although UUIDs had existed many years prior)[3] It may well be that the terms have recently seen some application with a more generic meaning for all kinds of unique identifers, but in the "standardized" and historical sense they are both 128-bit quantities with a definite (although variant) construction. --Dmeranda 21:09, 10 August 2006 (UTC)
--d-axel 08:08, 26 August 2006 (UTC)-- I am not an expert but even so I could easily see through the possible bias of the original article here. I enjoyed and learned a lot. Beginning a comment with a very derogative expression is not helping anybody, so I would guess that User:212.114.211.8 has had many annoying situations where his expertise was neither heard nor acknowledged. But please, number 8, do moderate yourself before moderating others. You sound like a true expert! :-)
A non-biased (Neutral Point Of View) version should emphasize that PXE boot is influenced by the fight for market shares and possibly for control over the consumer electronics market.
There is a strong focus on policies which can use ID of machines for control due to license and copyright earnings. This is natural and should not be viewed as a non-neutral point of view.
It should not take much space in the final article, though.
As an example of how licensing strategies influences other unrelated areas I would like to mention system deployment, unfortunately also an area for experts only: Deployment of an Operating system with 1 GB stuff need only take 3 minutes on an average 100 mbit network if there were no time-consuming license mechanisms involved. For the non-expert it might very well take half an hour and much more if no special tools are used. --d-axel 08:08, 26 August 2006 (UTC)
- It seems that all editors generally agree on what should be done and what bias is here. I suggest to just be bold and go ahead, making whatever changes are needed to fix it. Or is there any controversy? CP/M comm |Wikipedia Neutrality Project| 01:37, 17 September 2006 (UTC)
Response to claim of non-neutral point of view
Wow, there's been a of discussion since I last looked. Thank for the input. I don't know how anyone can claim be an experit on GUIDs, any more than they can claim to be an expert in 2's compliment binary integers. They are both just data-types, and we only know about how we've used that data type in the past.
I think I can qualify as an expert on Microsoft's use of the GUID, but I'll leave that to the community to decide. I won't claim to have an entirely non-biased point of view, I think all of our POVs are biased to some degree or another by our past experiences. I've been doing development work for over 25 years. I used DCE RPC and provided training and consultation on COM development, both as in independent consultant and later as a Microsoft employee. You'll find me acknowledged as a contributor to the book Inside OLE 2 by by Kraig Brockschmidt (Microsoft Press, 1993) ISBN: 1556156189, and later in the 2nd edition o this book.
That said, I'll agree with most of the comments on this discussion page, save those of 212.114.211.8. His/her comments are offensive to me. I'm jsut trying their best to contribute some knowlege while remaining as neutral as possible. It is was my understanding that a discussion page, like this one, is a place where potentially biased thoughts were to be discussed, and I've tried not to let my POV bleed into the main article.
I agree 100% that GUID and UUID are, from a practical standpoint, interchangable terms. Merging the articles might make sense, but I'll note that I've encountered the term GUID more in a Microsoft-related context and UUID in broader context, but I certainly wouldn't ever claim that Microsoft invented the concept. Personally I think Globally Unique ID is a more accurate description than Universially Unique ID, the universe is a pretty big place (do we even know how big?), so GUID is my preferred term.
I don't fully understand the some points d-axel is raising. Its true that PXE boot can use a GUID/UUID as a machine identifier. This has something to do with the SMBIOS standard. My understanding is that the option to use GUID's for boot-time machine identification is simply to eliminate the ambiguity of using MAC addresses on machines with more than one network adapter. What puzzles me is that I'm totally unaware of how this is connected to a fight for market share. I'd be interested in learning more about this, though I agree it doesn't seem appropriate in an article about the data type. Perhaps it belongs in an article about PXE boot.
I don't know how disputes over NPOV are resolved here, but I want to contribute where I can. I'd be willing to tackle reorganizing the main topic if there's general agreement to it, but with discussion questioning my neutrality, is that appropriate?
--Burt Harris 19:16, 16 September 2006 (UTC)
- It seems that problem is not neutrality of text, but just lack of information which makes it too Microsoft-centered. Please just forgive the IP user for some offensive comments, it can happen sometimes, let's just better look only at the arguments. Merging the articles could be considered as well - what are other editors' opinions? CP/M comm |Wikipedia Neutrality Project| 01:54, 17 September 2006 (UTC)
- Thanks CP/M. That's a good way to look at it. I was recently referred to a great document, ITU Reccomendation X.667 which clarifies several issues, including saying clearly that GUID and UUID are effectively synonyms. Unless there are objections, I'll make an update to incoroporate this information, and merge the topics. --Burt Harris 20:43, 27 September 2006 (UTC)
- I'm all for a merging of UUID and GUID. I also agree that the POV concerns really reduce to just a lack of information beyond the MS world; which merging the articles will help resolve. Although UUIDs came first, I suspect most of the world nowdays is more familiar with GUID, so I think merging UUID into GUID makes the most sense. There are perhaps some structural/organization edit work to be done too, but let's give the merge a shot first and then we can polish the resulting article after. I'd suggest that you go ahead and add the appropriate merge tags to the top of each article. [Oh, glad to meet you Burt, that OLE book is one of the few MS-related books I ever bought; I'm normally a Unix guy with a DCE background] --Dmeranda 17:00, 28 September 2006 (UTC)
- Merging is not the correct approach. As an encyclopedia we should reflect the view of "a common ground" and not try to squeeze the world into our own look. As we see in this discussion, people differentiate between UUID and GUID, because of technical and social differences. GUID is more Microsoft centric and a word often used for the GUID datatype in ActiveX and MSSQL Databases, where it is clearly defined. GUID can be used independent of microsoft, but people who use it may be thankful to know the correlation. The term UUID is used in a similar, but it seems different context. Both articles should remain, but give information about these aspects so that the general public can learn from our insights. Leobard 13:52, 15 March 2007 (UTC)
-- ITU-T ASN.1 Project leader, 29 September 2006
The introduction of ITU-T Rec. X.667 | ISO/IEC 9834-8 "Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components" that you can freely download from the ITU-T website and that is technically aligned with IETF RFC 4122 states:
"UUIDs are also known as Globally Unique Identifiers (GUIDs), but this term is not used in this Recommendation | International Standard. UUIDs were originally used in the Network Computing System (NCS) and later in the Open Software Foundation's Distributed Computing Environment (DCE). ISO/IEC 11578 contains a short definition of some (but not all) of the UUID formats specified in this Recommendation | International Standard. The specification in this Recommendation | International Standard is consistent with all these earlier specifications.]"
So this is also in favour of merging the GUID article into the UUID article.
- I would not say that, this is only a reason to prefer one view (of the ITU) over the view of another community (microsoft-aware developers). The ITU quote given states that the term GUID is not used in the recommendation. As with any standard, it can only reflect the view of one particular person (the editor of the standard) and the agreement of many (the signers/authority behind the standard). We should refer to the UUID article from the GUID article, but not merge them. Leobard 13:59, 15 March 2007 (UTC)
- Just to clarify, when Microsoft started to use GUIDs, they bragged loudly about the fact that this was part of their implementation of DCE's RPC standards (where they were called UUIDs). And the UUID RFC was partially written by a Microsoft employee. So I don't think there can be any doubt that GUID and UUID are synonyms, and always were. Besides, Microsoft also uses various other abbreviations ending in ID to refer to UUIDs that refer to specific types of objects (IID, CLSID, ProgID etc.). The only place where Microsoft's implementation differs significantly is that in binary representations, Microsoft often stores and transmits the little-endian byte order of the UUID fields, where most others use the big-endian byte order that matches the order of the hex digits in the text form.77.215.46.17 (talk) 00:07, 19 April 2011 (UTC)
BTW for those interested, a website is associated to the UUID International Standard.
Duplicates
This article used to specify the odds of generating a duplicate given a supercomputer running since the beginning of time. It used a naive equation of (total generated guids)/(maximum number of guids). That equation is wrong for calculating the odds of a collision. See Birthday Paradox. The number is much smaller. In fact given the approximate total number of guesses they were using and the birthday paradox equation I get the odds of a duplicate being roughly 1 : (1 - 10^(-10^26.876)). So close to one to one as to be almost indistinguishable. Not the 1 to hundreds of thousands as previously written. For ease of calculation I rounded the total number of guesses to 1.08405 * 10^33 and came up with this equation (2^128)!/(((2^128)^(1.08405 * 10^33)) * ((2^128)-(1.08405 * 10^33))!). The results are about what I expected. Bobprime (talk) 15:38, 24 February 2011 (UTC)
- Maybe you like the formula in the UUID article better; it would support the merge proposal (see above). –89.204.153.224 (talk) 09:47, 19 July 2011 (UTC)
Byte order
Is there an actual standard for how GUIDs are to be stored in binary form? The table in the article indicates big-endian for all fields, but Apple's "Technical Note TN2166, Secrets of the GPT" indicates that with GUID partition tables, the first three fields are stored little-endian and the fourth field is stored big-endian. My work with GUID partition tables bears this out. --Efalk (talk) 22:23, 11 July 2012 (UTC)
- RFC 4142 and most other official GUID/UUID standards require the fields to be stored and transmitted in big endian form. But Microsoft programs running on little endian CPUs use the little endian form in memory and in private file formats. This sometimes spills into public file formats and protocols, such as UEFI GPT partition tables. In other words, this is a design bug in the GPT disk format, which cannot be fixed without ruining lots of disks that are already in use. (Note: the last GUID field is an array of 8 bytes, so its format order is the same little endian as big endian). 77.215.46.17 (talk) 21:14, 13 August 2012 (UTC)
2^128
I've removed the following from the lead section:
- The total number of unique keys is 2128 or 3.4×1038. This number is so large that the probability of the same number being generated randomly twice is negligible; however, GUID numbers are not always generated randomly.[1]
I had trouble figuring out how to phrase it in a way that was actually useful but not misleading. I considered "the number of possible GUIDs" or similar, but it's unclear how you count the "backward compatibility" or reserved variants, and random GUIDs only have 122 random bits. ⇌Elektron 03:12, 11 July 2012 (UTC)
- I will put it back but phrased differently. 77.215.46.17 (talk) 21:16, 13 August 2012 (UTC)
Algorithm section too vague and sparse
The algorithm section should actually describe and illustrate the basic/standard algorithm used to generate GUIDs. Ouizardus (talk) 18:13, 13 December 2012 (UTC)
Microsoft bias
The "Subtypes" section refers to Microsoft COM as if it's the main focus of the article. An article on GUIDs shouldn't have an entire, generically-named section for a Microsoft specific implementation detail. There is already far too much Microsoft bias/boosterism on Wikipedia as it is. — Preceding unsigned comment added by 82.9.176.129 (talk) 00:27, 20 May 2013 (UTC)
What's the chicken and egg problem using sequential row IDs?
The line "database servers can use GUIDs to create unique row identifiers, solving the chicken and egg problem inherent with sequential row IDs." puzzles me. Joepnl (talk) 20:45, 13 June 2011 (UTC)
- Agreed - and the link to the article is no help since there is nothing DB related on that page. Psu256 (talk) 19:07, 14 July 2011 (UTC)
- It might be related to some locking mechanism that would be unnecessary using GUIDs? Maybe services like Twitter could use such a mechanism to be able to insert new rows in many different databases, replicating later? But even then it's not a chicken and egg thing (and a waste of resources, too). So, I deleted it and hope to see it reinserted with a mighty interesting argument why GUIDs would not be needlessly cumbersome in a database. Joepnl (talk) 00:20, 21 July 2011 (UTC)
There are several related issues:
- Sequential IDs require a source of a sequence of unique numbers. In an application that has to come from a database server because only the database server can be relied upon to know about all of the different clients that are connected to it and trying to get a unique ID, so only it can ensure that they are unique. But there's a problem there in the phrase "a database server". Which means one database server, not a collection of them. Modern large applications often use farms of database servers and often those can have different subsets of the data. That means there isn't one source of a unique sequence because each database server has its own sequence. Using a GUID or UUID is an attempt to solve this problem, letting an application generate its own unique IDs that will be unique regardless of which database server it is working with. Unfortunately it's likely to destroy relational database performance, as explained later.
- There's a related problem, are gaps allowed in a sequence? If they are this imposes a very severe throttle on the work rate because the server can't hand out a new ID with certainty until the last transaction that it gave an ID to has successfully committed. So no gap guarantees would not be a good move for a database server. This or 1 is probably the chicken and egg problem referred to in the original text.
- Generating sequences can be slow due to the related transactional and other overheads of maintaining consistency and concurrency. This turned out to be a significant performance problem with MySQL database servers that was resolved by introducing a less strict rule for generating sequence numbers.
- GUIDs or UUIDs are a possibly major performance problem. The effectively pseudo-random nature of GUIDs/UUIDs with very rapidly changing most significant bits causes the GUID/UUID as a primary key to have values that span the whole range of possible primary key values. This means that as data is inserted the working set size that the database server has to cache tends towards being the whole database size. Since caching algorithms rely on caching hot data the effect on performance can be dramatic. A naive application developer who doesn't know about database server issues might think that they are being clever by using GUIDs or UUIDs but in fact destroy the performance of their application. The generic fix is easy: use some slow-changing values at the start of the ID. That can be a few bits of a data or time, part of a machine or server ID or anything else that doesn't change rapidly. The result is a set of insertion points rather than insertion over the whole range of possible primary keys. Date-related values would often be a good choice because much real-world data is used more for recent things than old ones and partitioning is often used to reflect this. It's likely to be quite good practice to have the data relating to most recent information in only a few partitions, rather then spread around all possible partitions. One partition could be a performance bottleneck due to partition management related locking needs.
So GUIDs/UUIDs and databases can be quite unhappy partners. It takes a bit of knowledge of the whole application stack to know how to do things in a way that works for both the caching performance and uniqueness objectives. Jamesday (talk) 10:02, 8 July 2015 (UTC)
Pseudo Random Numbers
Using pseudo-random numbers sounds badly thought out. Many implementations of pseudo random numbers have as few as 4 billion different seed values, and that just isn't up to the job. I've already seen some 'homebrew' GUID generators online (javascript etc) that seem to suffer from that defect. I think that as well as mentioning pseudo randoms, you might want to mention that the tricky task of adding additional entropy or randomness is needed. 108.171.128.188 (talk) 15:48, 4 February 2016 (UTC)