Wikipedia:Reference desk/Archives/Computing/2015 October 1

From Wikipedia, the free encyclopedia
Computing desk
< September 30 << Sep | October | Nov >> October 2 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


October 1[edit]

Fast[er] file extracter[edit]

Hi,

Are there faster file unpackers on the market faster than the latest version of Winrar? I've seen many benchmarks with competitors (e.g., 7Zip, WinZIP), but most of them are about compression ratio or the time required to compress files. For example, a 1.5GB 7zip files takes around 2 or 3 minutes to unpack with Winrar on my PC, despite a 840 Evo SSD. Am I asking too much? Some on the web recommend to increase the process priority, however I'm afraid of it messing with the file's integrity. Matt714 (talk) 02:24, 1 October 2015 (UTC)[reply]

I'm fairly sure most decompressors for most file formats tend to use the same library, so you'll see little difference between 7zip, WinZIP or WinRAR when decompressing the same file. It's unlikely process priority will change anything, unless you're doing something else with significant CPU usage on your computer, however it also won't effect file integrity. For something like 7zip LZMA2 at maximum settings decompressing large files, this will often be CPU limited rather than IO limited, perhaps even on a recent HD decompressing to the same HD (although this will depend a bit on how the OS and decompressor caches, so could vary from 7-zip to WinRAR to WinZIP). So your SSD may very well be irrelevent. Nil Einne (talk) 09:07, 1 October 2015 (UTC)[reply]
Actually from further testing I'm not sure if decompression normally is CPU limited, even with LZMA2 at maximum settings although I was lazy to test fairly compressible files of large sizes. However I'm also fairly surprised it takes 2-3 minutes to decompress a 1.5GB file. Are you dealing with lots of small files by any chance? SSDs are a lot better at dealing with lots of small files then HDs, but they're still going to be a fair amount slower then when dealing with large files depending also probably on the file system. Alternatively, if you're dealing with extremely compressible files, and extracting something like 75GB from the 1.5GB file, well then SATA-600 is limited to 600MB/s at most and not all SSDs can even achieve that with maximum throughput, so it's not exactly surprising it will take 2+ minutes to write, even if you only have a small number of highly compressible large files. Nil Einne (talk) 17:02, 1 October 2015 (UTC)[reply]
Compression is usually CPU-limited; however, decompression can be RAM-limited. Modern decompressors use much RAM (64M when decompressing are far from unusual) and the accesses across that RAM buffer are mostly random (If they were not, there would be some sort of pattern which could be exploited for further compression) so that you'll encounter lots of cache misses and wait states on the CPU.
Raising the priority doesn't achieve much unless the CPU is highly contested (that is, unless there are many processes competing for CPU time), and multi-core CPUs usually are not. There might be several CPU threads but most of the time, the decompressor will either utilize >90% of the CPU time as it is, or utilize all the cores it can while the other cores are close to idle. - ¡Ouch! (hurt me / more pain) 14:44, 6 October 2015 (UTC)[reply]

Thanks for your input. I just re-tested the 1.5GB file, and it took around 2 minutes (exactly) to extract (not counting the moving time by Windows) -- seems like I was overestimating the required time. My system has a i7-4930k with 16GB DDR3. Also tested 7Zip, and instead of 1m57s it took 2m01s; so WinRar it is Matt714 (talk) 21:14, 1 October 2015 (UTC)[reply]

With some more testing, I return to my original statement that decompression may very well be CPU limited presuming we're talking about something like LZMA2 with maximum compression, even with large files. Anyway how many repetitions did you do? A 4 second difference would likely be in the margin of error of any test unless you did probably at least 5 per decompressor including multiple restarts and only have an average of 1 second difference between repetition.

I am a bit surprised given your specs it's taking so long if you're referring to large files. My own testing on a much weaker (single threaded) CPU took slightly under 2 minutes for a 3GB file. However, it will probably depened on precisely how well the file compressed. But are you sure you aren't decompressing a lot of small files? If it's IO limited your CPU is probably largely irrelevant.

BTW why are you moving stuff around, rather then extracting directly in to the desired location? That could easily waste more than 4 seconds.

Nil Einne (talk) 16:06, 2 October 2015 (UTC)[reply]

7z files can also be bzip2 compressed, decompression is much slower than LZMA2: 5.28s against 2.23s for 85MB in this test. (At maximum compression level, LZMA2 used 15 times more memory 4MB vs 66 MB) Ssscienccce (talk) 22:02, 3 October 2015 (UTC)[reply]
In general, when designing compression algorithms there are always going to be tradeoffs between compression time, decompression time, and compression ratio. If you find you're spending too much time decompressing files, you might want to look for a different algorithm that's optimized for that (and that, yes, might not fare so well on compression time or ratio). —Steve Summit (talk) 13:35, 3 October 2015 (UTC)[reply]
The LZMA family is usually superior to PPMd in both archive size and compression/decompression time (by a large margin at decompression), if a bit heavy on RAM when compressing. Not that the latter counts if you have >4 gigabytes of the stuff. PPMd is usually inferior except with "text" files: English text, source files, logs, etc. However, if archive size is not critical, the LZMA family is your best bet, due to decompression perfomance, and it never seems to lose to badly to PPMd. I'd recommend the LZMA2 over vanilla LZMA, too, because of the worse "worst case" of LZMA Uno. - ¡Ouch! (hurt me / more pain) 14:44, 6 October 2015 (UTC)[reply]

Why hasn't flash memory become cheap enough to give away?[edit]

The per-byte price of flash-memory storage media, including USB flash drives and SD cards, has continuously dropped over the past decade or so. But the unit prices of commonly available ones haven't decreased nearly as much. While I can now get a 32 GB microSD card for a tenth of the price of a 512 MB SD ten years ago, it still costs in the neighbourhood of $10. Why doesn't anyone make 512 MB cards that now sell for less than a dollar, that could serve as a replacement for optical and floppy disks? While online file transfers have largely taken over, there's still a need for offline media that one can cheaply give away. Are the per-unit fixed production costs so high as to disallow such an approach? Or am I the only one who would want such a product? --Paul_012 (talk) 06:26, 1 October 2015 (UTC)[reply]

I don't know where you are located, but in India pen drives are sold for Rs 100-150 (the same as a local beer, or $1.5-$2) and have your 512 MB storage capacity. If they are sold for $1.5-$2 as a unit including postage, I suppose their market price for businesses might be less than the amount of $1 that you refer to. I don't know how does this would compare to the floppy disks of past ages, but it's certainly cheaper than printing 100s of pages.--Scicurious (talk) 07:01, 1 October 2015 (UTC)[reply]
Yea, it's all that extra junk that goes with it. There's the connector, the plastic case, the package, someone's time to stock it and sell it in the store, etc. They might be able to reduce some of that. I have one flash drive with no case, it's just a circuit board and a connector. Very compact and cheap, but also ugly and fragile. They could maybe sell a package of 100 of those in 512MB size for less than $100 (less than $1 each). If there was enough demand, I'm sure they would. StuRat (talk) 14:00, 1 October 2015 (UTC)[reply]

@Paul 012, Scicurious, and StuRat: hardware producers focus on profit. there is not enough people to buy massproduced low capacity flashes in the society to make it profitable.

Are you sure it hasn't? USB sticks are a pretty common giveaway at trade shows etc, and Alibaba shows lots of retailers who'll sell USB sticks in bulk (1 GB for under a $1). Smurrayinchester 15:47, 1 October 2015 (UTC)[reply]
(One extra point on giveaways - one disadvantage of flash media is that they're difficult to write in bulk. You can fairly easily get machines which accept a big pile of blank CDs or DVDs, and can just write them, label them and spit them back out, but for most companies writing a file to a big pile of USB sticks or memory cards still has to be done manually (unless you can do it at the manufacturer during formatting). You can make it easier by using USB hubs/multi-card readers, but it's still much more manual, and still more expensive than a cheapo optical disk.) Smurrayinchester 12:05, 2 October 2015 (UTC)[reply]
I have a drawer full of USB sticks that I've been given. Some don't work at all and I toss them to the garbage. Some don't handle rewrite and shrink in size every time I delete a file. A few (very few) have quality that I'd pay a tiny bit for. The only real use I've found is to put movies on them for my kids so they can plug it into the TV and watch them. Turns out that USB sticks don't scratch and turn to crap as easy as DVDs. 199.15.144.250 (talk) 17:17, 1 October 2015 (UTC)[reply]

Where are the files of a tool in Linux[edit]

How can I find what files a tool is using? I know that the binary 'fortune' is stored at /usr/games/fortune, and that the data is stored at /usr/share/games/fortunes. The first can be discoverd with 'whereis fortune', but what command would output the second? Or simply output a list of directories being accessed by a tool, be it for reading or writing files? That is, how can I monitor what a tool is doing. --Scicurious (talk) 13:30, 1 October 2015 (UTC)[reply]

lsof will get you most of the way there... but it takes some skill to use and understand lsof output. It also won't tell you if the file access is more indirect, e.g. if a helper process or daemon is involved.
For me, the more interesting question is: how can I determine which source-code was used to build a specific executable program that is on my *nix system? This is more useful to me; with source, I can determine and debug program behavior (including, but not limited to, file system access). Unfortunately, finding source can be a lot more difficult: essentially, you depend on your software vendor to maintain a complete "reverse look-up" database that maps specific source projects to specific files that ship with the "distro." Few distributors make this procedure easy; there is often a lot of hunting and guessing. My favorite commercial Unix distributor makes this lookup process much easier, but their software is regrettably non-free. Linux, on the other hand, is free - but the source comes from thousands of places and is managed by an uncountable number of independent contributors and organizations. So, if I boot up my trusty and reliable Ubuntu 9.04 box, and I find "fortune" on my disk at /usr/local/bin/ ... it is not easy to know who authored, built, and delivered that version of binary. I have to dig through old Canonical archives, mailing lists, and FTP servers; and I have to already know where to look.
Nimur (talk) 14:10, 1 October 2015 (UTC)[reply]
use rpm -ql package-name to see what files belong to a package (RPM-based distros; there must be a similar command for dpkg.) You can monitor file access by tracing a process's syscalls. For me, strace worked well. I've used it on two or three occasions to investigate bugs (in unrelated software) which I would later report to the maintainers Asmrulz (talk) 16:26, 1 October 2015 (UTC)[reply]
The above answers are useful, but for data that "belongs" to a certain program (like fortune), if the packager follows standards like the Filesystem Hierarchy Standard (FHS), you can just look at the standard directories. In this case, fortune is following the FHS, and so it keeps the fortune lists in /usr/share/games where they belong. The "Unix way" has long been to store files in a standardized directory tree, rather than just scattering them wherever as is the norm on some other platforms. See Unix filesystem and man 7 hier. --71.119.131.184 (talk) 21:53, 1 October 2015 (UTC)[reply]
One technique that often works is to run strings on the (binary of the) program in question. Often this will reveal the hard-coded pathnames the program will use. (And you could narrow the search down with strings | grep / or something.)
Another (even more hard-core) technique is strace, which lets you snoop on all the system calls (and, in particular, all the file-opening calls) a running program is making. —Steve Summit (talk) 13:28, 3 October 2015 (UTC)[reply]

I broke the Assume Good Faith option for myself[edit]

Moved to WP:VPT. Dismas|(talk) 13:41, 2 October 2015 (UTC)[reply]

I see no result for "Geo-IP", why?[edit]

Geo-IP doesn´t work, is this user using an open proxy? --Poker chip (talk) 16:55, 1 October 2015 (UTC)[reply]

It works for me, giving the following results:
IP-Adresse: 112.198.82.115
Provider: Globe Telecoms
Region: San Juan (PH)
This is similar to the info in a WHOIS [1].
I take it you're aware Geo-IP loads a Google Maps where it shows the results and this may happen after it loads an ad (and the site also seems a bit slow). Even so, in future, if you have problems try using a different geolocation service in case the one you're using is playing up, or simply lacks info for that particular IP for whatever reason. Note the absence of a result in any geolocation service is probably not an indication of an openproxy.
Nil Einne (talk) 17:14, 1 October 2015 (UTC)[reply]
I don´t understand this, but yes I see now a result on Geo-IP but I have first not having any result. I saw this "map" and the error message that there couldn´t be found any location to this ip" I have seen this error 2 times and I was interested why it isn´t showed. And I think it is an open proxy because he has edited in the german IP but he is located in a country miles away. --Poker chip (talk) 23:00, 1 October 2015 (UTC)[reply]
Geolocation service do have problems, one reason why it's generally a good idea to check another service if you're having unexpected results. Not sure what you mean by "he has edited in the german IP". If you mean the editor has edited from a German IP before, well there are only about 9 edits for that IP, and there's nothing that looks like it will establish a clear link to another editor, so you may simply be mistaken about who this editor is. Even if you are correct, it's always possible that the editor went on holiday in the Philippines. (Although the IP seems to have edits over about 3 months, so it would likely have been a long holiday.) Note that someone from the Philippines editing the German wikipedia isn't a definite sign of an open proxy, there are definitely German speakers in the Philippines, including German tourists and ex-pats or migrants as well as locals who've learnt German. German may not be English, but it isn't exactly Njerep language either. Nil Einne (talk) 15:53, 2 October 2015 (UTC)[reply]