« Of signatures and rubber-stamps, oaths and Bibles | Main | NYT: "Drag [our] reporters into court" where they can flout the law! We insist! »

Wednesday, November 29, 2006

A long war story about a Beldar cross-examination, and a technical bleg about "meta-data"

This is a technical "bleg" (meaning a beg for help from blog readers) about "meta-data," a/k/a "embedded data," in digital files. I'm hoping that a few of my readers might be able to answer my technical questions, but others might still find the technical question interesting — especially the lawyers among you — in which case you may well want to skip to the very end of this very, very long post.

But as a roundabout way to explain why the subject of this bleg could be important, I've included a trial lawyer war story, which in turn includes long quotes from one of my cross-examinations in a recent tradename injunction case. It mostly falls into the category of Beldar self-congratulations (ow! my arm's broken from patting myself on the back!). Arguably it also falls into the category of "educational examples of how to impeach a witness effectively from a prior affidavit on cross-examination" — affidavits and their uses and abuses seeming to be a subject of obsession for my legal blogging, I guess.

******

Back in October 2004, when the conservative blogosphere was very busy shredding the claims by CBS News, its "60 Minutes" program, and its then-Grand Poobah Dan Rather about the authenticity of the "Killian Memos," a/k/a the "Texas Air National Guard" documents forgeries, I was much impressed with how some of my readers and commenters began to delve into the "meta-data" (also sometimes called "embedded data," and yes there's a slight distinction but I'm not sure I can explain it) associated with the various .pdf files containing scans of the documents forgeries.

In addition to the .pdf scans of the so-called Killian Memos that were available for download on the CBS News website, there were other .pdf files containing scans that were purportedly made from the same original documents on various other news organizations' websites (e.g., USA Today's site), some of which those news organizations claimed to have obtained independently of CBS News' (so-called) investigatory efforts. At one point, based on a date that was embedded in one such .pdf file, it looked like Fox News' version of the scan had been created many months before CBS News claimed to have been approached — which seemed like a big, big deal at first. But then my commenters seemed to reach a consensus that the most likely explanation was that the scan had been done on a non-networked computer or scanner whose system clock was several months out of date, possibly due to a bad battery — a terrific example of the blogosphere's distributed information processing coming up with a non-conspiratorial explanation (even though it didn't fit what most of us expected, and probably wanted, to find by that point). And it marked the first time that I became aware of the possibilities that embedded data might have in an adversarial search for the truth (a/k/a "what I do for a living").

******

Fast-forward to the much more recent past — this October. What I'm about to describe are all matters of public record in a tradename lawsuit that has just ended through an agreed settlement. There's no confidentiality agreement as part of the settlement, but simply for taste reasons, I'm going to conceal the actual identities of the parties, witnesses, lawyers, and products behind pseudonyms.

The defendant in the lawsuit — that is, the alleged trademark infringer — I'll call "Doe Corp." The plaintiff, whom I'll call "Doe Inc.," asserted that it had a superior right to use the tradename "Doe." Both companies were based in Europe, where they'd done business side by side for centuries. Both of them manufacture what I'll call "widgets." And both were in fact founded and are still owned by families named "Doe." But Doe Inc. claimed that it had started using the "Doe" name in connection with its widgets in the U.S. and the Western Hemisphere several years ago, and that it had spent lots of time and money promoting the "Doe" name here — whereas Doe Corp. was, according to Doe Inc., a new-comer to the widget market in the US and the Western Hemisphere. Doe Inc. also claimed that Houston is the widget capital of the Western world, and that Doe Corp. was causing customers here to become confused between Doe Inc. widgets and Doe Corp. widgets, in turn causing Doe Inc. to lose sales.

So Doe Inc. had gotten an emergency "temporary restraining order" in state district court in Houston that prohibited Doe Corp. from using the name "Doe" in this half of the world. Doe Inc.'s lawyers did so "ex parte" — meaning without anyone from Doe Corp. being present — based on Doe Inc.'s assertion that this was such a big emergency that there just wasn't time to give Doe Corp., all the way over in Europe, any notice of the hearing on Doe Inc.'s TRO application.

That very directly affected the business of my client, whom I'll call Acme. Acme is a Houston-based company that buys and then re-sells widgets from many companies, among them both Doe Inc. and Doe Corp. Because of the TRO, Doe Corp. suddenly couldn't sell Acme any more widgets — and the world-wide widget market is smoking hot right now, and Acme needs lots and lots of widgets as fast as it can get them! Indeed, Doe Inc. was even making noises about trying to use the TRO it had gotten against Doe Corp. to stop Acme from "acting in concert with Doe Corp." In other words, Doe Inc. was suggesting that Acme was deliberately helping Doe Corp. infringe on Doe Inc.'s tradename, even if just by re-selling the Doe Corp. widgets that Acme already had in its inventory. So even though Doe Inc. hadn't yet directly sued Acme, Acme instructed me to jump into the middle of this lawsuit (i.e., to "intervene") to protect Acme's own interests.

Thus it came to pass that in mid-October, we had a two-day evidentiary hearing on Doe Inc.'s application to convert its TRO into a longer-lasting pretrial injunction — called a "temporary injunction" in Texas state-court practice, but very analogous to a "preliminary injunction" in federal-court practice. This temporary injunction hearing was going to be a very big deal — potentially freezing millions of dollars of widget commerce for many months, perhaps even more than a year, until there could be a full jury trial on the merits after everyone had conducted pretrial discovery. And it was going to be conducted blind — in other words, without either side having obtained the others' documents or taken the other side's witnesses' depositions. This was going to all be "shoot from the hip" trial lawyering — by far the most dangerous, and by far the most fun (if terrifying) for the lawyers!

The final thing you need to know to understand this war story has to do with the rules governing injunctions — TROs, temporary injunctions, or permanent injunctions. A defendant can defeat an injunction by showing that the plaintiff was not diligent in trying to protect his rights — in other words, that the plaintiff knowingly let his rights be trampled for a long time without saying a peep. That's especially important in a trademark/tradename contest. So precisely when Doe Inc. first learned of the alleged tradename infringement by Doe Corp. in the U.S. was potentially very important — and could possibly even decide the outcome. "Why should I freeze everything for the next few months," judges are prone to ask, "when your client has known about this controversy, but sat on its butt without filing suit for several months, counsel?" There rarely is a good answer to this question.

******

At the evidentiary hearing, we heard from a witness whom I'll call "Mr. Smith." Mr. Smith works for Doe Inc., and his company's lead lawyer, whom I'll call "Mr. Black," called Mr. Smith to testify about supposed confusion in the marketplace between the two companies. But in the course of Mr. Black's direct examination of Mr. Smith, Mr. Smith volunteered that he'd gotten his first strong hint of Doe Corp.'s allegedly infringing use of the Doe tradename in the U.S. way back in February, and that he'd gotten firm confirmation of it in April. These were surprising admissions — harmful to Doe Inc., and very helpful for Doe Corp. and Acme. And when Mr. Black was done with his direct examination, Doe Corp.'s lawyer, whom I'll call Mr. White, very crisply and effectively re-confirmed and highlighted these admissions as part of his cross-examination of Mr. Smith.

But then came my turn. And because I recalled something that it seemed that the other lawyers in the courtroom either didn't know or had forgotten, I found myself with a textbook opportunity to conduct a very, very fun cross-examination. Here's the transcript, unedited (except for the substitution of pseudonyms and a few bracketed explanations):

THE COURT: Mr. Dyer?

CROSS‑EXAMINATION BY MR. DYER:

Q. Mr. Smith, besides testifying here today in Court, you've previously given a written affidavit in this case; is that correct?

A. Yes, sir.

Q. You signed it before a notary public on September 22nd, 2006?

A. Yes, sir.

Q. Do you know that it was attached to the papers that your company's lawyers filed with the Court to ask for a temporary restraining order?

A. I don't know that personally. I know I signed the affidavit.

Q. Did you read it before you signed it?

A. Yes, sir.

Q. Did you read it carefully?

A. I did read it.

Q. I'm sorry?

A. I did read it carefully.

Q. Carefully? Did you understand it was testimony that you were giving in written form, just as if you were sitting here in court, as you are today?

A. Yes, sir, I did.

Q. You certainly wanted that testimony to be accurate?

A. Yes, sir.

Q. You certainly wanted that testimony to be complete?

A. Yes, sir.

Q. You certainly didn't want to mislead this Court?

A. No, sir.

Q. Mr. Smith —

MR. DYER: May I approach?

THE COURT: You may approach.

Q. (BY MR. DYER) Let me show you what I marked as Acme TI 2, for Acme Temporary Injunction Exhibit No. 2. Do you recognize that to be a copy of your affidavit?

A. Yes, sir.

Q. Do you recognize your signature on the second page?

A. Yes, sir.

MR. DYER: May I look over the witness' shoulder with him? We only have this one copy, I know, because it was attached to the court papers. The other counsel have it. It's not necessarily right here in hand.

MR. BROWN [the second-chair lawyer for Doe Inc.]: Here's an extra copy, if you want one.

(Tendered)

MR. DYER: Thank you.

MR. DYER: May I hand this up for the Court?

(Tendered)

THE COURT: Thank you.

Q. (BY MR. DYER) If we read together in your affidavit, it says, paragraph two, that you're employed as a quality coordinator at Doe Inc. Correct?

A. Yes, sir.

Q. Then the rest of paragraph two describes the September 20th encounter with Mr. Jones from Acme?

A. Yes.

Q. Speak up so the court reporter can hear you.

A. Yes, sir.

Q. Then the last sentence of that paragraph reads, quote, "Prior to August of this year I was unaware that there was another company named 'Doe' that sold widgets."

Do you see that, sir?

A. Yes.

Q. That's false, isn't it?

A. I believe it would be April, is when I —

Q. The statement is false as written, isn't it?

The transcript doesn't show it, but everyone who was there in the courtroom will remember that at this point, there was a long, painful silence. I'd estimate the silence to have been at least 30 seconds, but it probably seemed much longer to Mr. Smith. And I could practically hear the gears turning in Mr. Black's head as he tried to think of some objection that might get his witness off the hook I'd carefully baited, set, and then yanked.

A. Yes, sir.

Q. Misleading as written, isn't it?

MR. BLACK: Objection. It's an insignificant, technical error. Not misleading.

MR. DYER: That's a fine argument.

THE COURT: Excuse me. Do you have a legal objection?

MR. BLACK: Badgering the witness. I gave a copy so he wouldn't hover over his shoulder.

THE COURT: All right. Let's — I'll sustain the objection. Well, I'm sorry, the first question was false?

MR. DYER: Yes.

THE COURT: Reverse myself. Overruled.

Q. (BY MR. DYER) That statement that the first time you knew there was another company named 'Doe' that sold widgets was in August of this year, 2006, that was also misleading, wasn't it?

MR. BLACK: Objection. At the time —

A. No —

Q. (BY MR. DYER) Do you think it was misleading?

A. Sir, I was only told by another employee there that it was Doe [something else], not Doe Corp.

Q. Well, any other company named 'Doe.' You denied in the affidavit that you knew there was any other widget company that used the name 'Doe,' you denied having known that before August of this year, and that was just wrong when you put that in the affidavit. Do you agree, sir?

A. Yeah, that was a mistake.

Q. As you testified here today, you knew at least as far back as April of this year?

A. Yes.

Q. You could have known, as Mr. White's questions established, as far back as February of this year, had you taken the trouble to look?

MR. BLACK: Objection. Mischaracterizes the prior questioning and prior answers.

THE COURT: Overruled.

Q. (BY MR. DYER) Could have known in February, if you looked?

A. I guess my only question is how come the Acme employees didn't look?

Q. I understand that that's an argument your lawyer may make later on. But my question to you is, could you have looked in February and found out as early as that, if you had taken the trouble?

A. Yes, sir.

This whole series was about as close to a real-life "Perry Mason moment" as any trial lawyer is likely to get. But there was more, near the end of that same cross-examination.

Q. [BY MR. DYER] While we're talking about reasons you didn't do things, is there a reason you didn't tell the Court in your affidavit when you said — the same affidavit now, which said you first learned about another 'Doe' company in August — is there a reason you didn't tell the Court in that affidavit about the April and February contacts with Bernard? [Bernard was the Acme employee whom Mr. Smith identified as having told him about Doe Corp. selling widgets in the U.S. in February and April.]

A. Can you repeat that one more time?

Q. Sure. Is there a reason you left out of your affidavit the February and April contacts with Bernard?

A. Is there a reason I left it out?  No, sir, there's no reason.

Q. In fact, somebody else wrote that affidavit for you to sign, didn't they?

A. No, sir.

Q. Did you type it up?

A. I did not type it up.

Q. Who typed it up?

A. I believe our attorney typed it up.

Q. And I don't want to get into conversations between you and your attorney.  But, is it fair to say that you weren't the one who made the decision to leave out the discussion of the February and April contacts?

MR. BLACK: Objection. There's no way to answer that question without getting into attorney/client communications.

MR. DYER: If that's the case, then we may need to talk about the crime fraud exception, have some testing [of] privilege.  I'm trying to avoid that.

MR. BLACK: Hold on a second.

THE COURT: Come on up, counsel.

(Discussion at the Bench)

MR. BLACK: May I?

THE COURT: Well, no. Why don't you examine what this witness knows about the transaction, short of what he was told by his lawyer.

If you want to cover the circumstances under which this affidavit was prepared, I think that would be appropriate.  But at some point we're getting into attorney/client privilege issues.  I don't want to pre-judge the crime fraud issue, but —

MR. DYER: Somebody made a decision not to tell this Court —

THE COURT: That you haven't asked. Your question assumed that. Did not ask that.

MR. BLACK: But I'm giving you this hypothetical. Hypothetical that —

THE COURT: I'd rather not do that in front of the witness.

MR. WHITE: Can we take testimony? Did you make the decision?

(End Bench conference)

Q. (BY MR. DYER) Did you make the decision to leave that out of the affidavit, Mr. Smith?

A. No, sir.

Q. Did you make the decision what to put in the affidavit?

A. I just told things as I knew them.

Q. I don't want to get into the substance of what you told the lawyers or didn't tell the lawyers. Is it fair to say you had a communication with them verbally, and then they handed you an affidavit and you signed it after reading it?

A. Yes, sir.

Q. After missing the [sarcastic tone and "air quotes" with fingers] mistake?

A. Yes, sir.

******

Based on Mr. Smith's live testimony about the dates in response to Mr. Black's questions, Mr. White's cross-examination had already highlighted the fact that Doe Inc. couldn't prove one essential part of its case — in other words, couldn't prove it had acted promptly to protect its supposed rights to the "Doe" name. But Mr. White's cross-examination hadn't quite shown that anyone was a scoundrel — only that they'd been rather slow to react.

My cross-examination took it a step further, however, impeaching Mr. Smith's personal credibility by pointing out the vast inconsistency between his live sworn testimony from the witness stand and his written sworn testimony from the affidavit. But even more important, my cross-examination showed that Mr. Smith's employer, Doe Inc., almost certainly had misled the court about this subject when Doe Inc. got the ex parte TRO. Judges don't like being misled on important things. Perhaps that misleading wasn't deliberate — Mr. Black continues to insist that this was all just an innocent misstatement, a memory lapse. But it was nevertheless on a subject so important that a "mistake" of this magnitude was not likely to be excused by the court even if innocent.

The evidentiary hearing ended a few hours later. My cross-exam of this witness was far from the only reason — Mr. White and his colleague did a terrific job on other important topics too — but suffice it to say that Doe Corp. and Acme won the hearing: The court denied Doe Inc.'s request to convert the TRO into a temporary injunction that would have remained in effect for several months until a full trial. Doe Corp. was free to go on selling its widgets throughout the Americas, and Acme was free to continue to buy and re-sell them.

And within a matter of a few weeks after the hearing, the whole case settled with a whimper, not a bang.

******

That's a very long war story to lead up to my technical bleg, but it gives some context for how the subject of my bleg can become important to lawyers.

Since I had established that this witness and his employer had submitted an at-least-badly-mistaken and possibly deceptive sworn affidavit to get the TRO, Mr. White and I were very keen to dig further into the subject of that affidavit as part of our preparations for a full trial on the merits, and the judge had indicated that we'd get a chance to test just how innocent the "mistake" about the dates actually was. We wanted to know who had prepared the affidavit, who had revised it, how many drafts it went through, what changes were made during the drafting stages — and as to each of these issues, when. As the first step, Mr. White had sent a document production request that sought "the production (in electronic form) of the affidavit of John Smith .... Pursuant to Tex. R. Civ. P. 196.4, Doe Corp. requests that the document be produced in electronic form, in native format with all associated metadata." But the case settled before we got Doe Inc.'s response (which would inevitably have been the next step in a complicated battle over attorney-client privilege).

I recently took a very good online continuing legal education course about meta-data and embedded data, prepared by Mercer University School of Law Professor David Hricik. (The pseudonymous Mr. White has also taught CLE courses on this subject, and already knows much more about it than I do.) As with the meta-data embedded within .pdf files that was discussed at the beginning of this post, Prof. Hricik's course taught me how to find some of the cool info that can be embedded within, and hence extracted from, Microsoft Word .doc files. The type and extent of the available data changes depending on what settings one's Word program has as its defaults, and/or how the settings have been re-configured for any given document. But in my own experimentation, I've found that a great deal of that embedded data seems to be altered by any re-saving of the .doc file once it's been opened to look for that data. Plus, other types of embedded data, including when the file was created, appear to be re-set any time the file is even copied (for example, even onto a CD).

Hence, finally, my technical bleg: I can send rude instructions — a detailed warning that opening and then re-saving the Microsoft Word .doc file, for example — will be argued by me to be an intentional spoliation of evidence by my opponent, potentially subject to severe punishment by the court. But short of going over to Mr. Black's office and demanding to be allowed to log onto one of his networked workstations, given a password, and shown how to access the .doc file myself from where it's stored on his firm's servers — presumably with a videographer looking over my shoulder to document what I'm finding! — is there any good and fairly easy way to ensure that a digital file that's supposedly in "native format with all associated metadata" (but that almost necessarily will have been copied from an "original" file) won't have suffered some of these alterations?

Posted by Beldar at 10:40 AM in Law (2006 & earlier), Trial Lawyer War Stories | Permalink

TrackBacks

Other weblog posts, if any, whose authors have linked to A long war story about a Beldar cross-examination, and a technical bleg about "meta-data" and sent a trackback ping are listed here:


» Welcome back, Beldar! from TechBlog

Tracked on Nov 29, 2006 3:53:57 PM

Comments

(1) Neil Carpenter made the following comment | Nov 29, 2006 11:59:01 AM | Permalink

Insist on the use of read-only media (CD-R/DVD-R) to store the file after it has been collected.

(2) Beldar made the following comment | Nov 29, 2006 1:06:48 PM | Permalink

Aha. I had assumed that any software I used to copy a file onto the CD-R or DVD-R would reset the meta-data in the same manner that a re-save or a "save as" from within Word would do. But apparently that was a mistaken assumption. I just tried it, and indeed, the meta-data looks the same as the un-resaved version!

Well now I feel silly, but it's not the first time. Simple answer to my bleg, and still a very long war story. Thank you!

(3) jon made the following comment | Nov 29, 2006 2:08:54 PM | Permalink

Copying the file will not modify the contents.

To my limited knowledge, only photographic applications include some guarantee (via "watermarking") metadata (EXIF) if the camera sets this (e.g., Canon DVK-E2 Data Verification Kit). There are no such protections in standard word processors.

(4) laocoon made the following comment | Nov 29, 2006 2:12:44 PM | Permalink

It is easy to ensure that not one single bit is changed, thus ensuring that the metadata, embedded data, formatting, line breaks, and everything else are kept totally unchanged.

Sign and encrypt the file.

Then copy around the encrypted file. Whenever you decrypt it, you will get exactly, bit for bit, the file you originally encrypted - with a digital signature to verify that not one bit has changed.

PGP/GPG is good for this purpose.

(5) antimedia made the following comment | Nov 29, 2006 2:38:55 PM | Permalink

There's no "chain of custody" with metadata, so it's impossible to say if a document has been altered from the original short of a full forensic examination of the hard drive on which the document was created.

If the metadata impeaches your opponent in some way, then it's safe to say that he was unaware that the metadata was tracking his every move. If, however, the metadata is vindicable it is not proof of innocence. It is the absence of proof of guilt. It may well provide impeachable evidence once a full forensic examination is done, but you would have to weigh the stakes against the expense before proceeding; that expense being both in time and money and in strength of position should the investigation not bear fruit.

(6) Carlos made the following comment | Nov 29, 2006 4:12:07 PM | Permalink

Beldar, glad to see you back in action. You were right in the Harriet Miers discussions, and there are many readers (including a few Texas lawyers from the UT law class of 1980) who are still irritated about the whole matter, even though Judge Alito will probably be a superb justice. Your insight is valued by many, so pace yourself, lest we have to click on your link for a year between posts.

(7) Kent G. Budge made the following comment | Nov 29, 2006 4:15:50 PM | Permalink

Beldar's back! Yay!

Wonderful war story. Well worth any silly questions. Remind me to mind my Ps and Qs so I never end up in the witness stand!

(8) Phelps made the following comment | Nov 29, 2006 6:35:00 PM | Permalink

Copying the file to any media (even a CD-ROM) will destroy metadata. You'll lose the system accessed/modified dates on CD.

The way we have approached it is to immediately hire a forensic expert and have them actually go onsite and make forensic images of the entire hard drive(s). Then, the other side can muster up thier own expert to check the image for privileged documents, and your expert can then dig out all the metadata (and give you a nice EED summary/database) for you to review.

It's hideously expensive, but the only good way to do it. Even turning the computer on will begin altering metadata, especially on servers. There are routines that run on a regular basis that overwrite deleted data (as a matter of function, not some nefarious intent) and log files that end up overwriting other parts of the old logs, etc.

(9) Phelps made the following comment | Nov 29, 2006 6:43:19 PM | Permalink

Copying the file to any media (even a CD-ROM) will destroy metadata. You'll lose the system accessed/modified dates on CD.

The way we have approached it is to immediately hire a forensic expert and have them actually go onsite and make forensic images of the entire hard drive(s). Then, the other side can muster up thier own expert to check the image for privileged documents, and your expert can then dig out all the metadata (and give you a nice EED summary/database) for you to review.

It's hideously expensive, but the only good way to do it. Even turning the computer on will begin altering metadata, especially on servers. There are routines that run on a regular basis that overwrite deleted data (as a matter of function, not some nefarious intent) and log files that end up overwriting other parts of the old logs, etc.

(10) BC made the following comment | Nov 29, 2006 6:57:02 PM | Permalink

Beldar: The way you accomplish this is with checksums.

You'll want to take the original file and pipe it through a cryptographic hash function such as MD5, using a tool such as md5sum. This will generate a fixed-length string which is variously referred to as a hash sum, a message digest, or a digital fingerprint. Store this string on read-only media along with the original copy of the document.

When you're trying to verify the integrity of a downstream copy of the document, simply run the copy through the same hash function and compare the output with the hash sum of the original. If the documents are identical down to the metadata, the hash sums will match. Otherwise they won't.

(11) BC made the following comment | Nov 29, 2006 7:05:36 PM | Permalink

Note that my above post applies only if you're concerned with file-level integrity, and don't particularly care about, for example, the "last modified" date stamps that appear in your Windows Explorer. I know of no way to ensure OS-level integrity of downstream copies; I'll defer to Phelps there.

(12) ed made the following comment | Nov 29, 2006 8:59:36 PM | Permalink

Hmmmm.

For legal purposes I'd suggest following what Phelps pointed out. Otherwise you run into the problem where you cannot adequately verify that some unknown process, virus or malicious program didn't modify the file in question.

So the only way to be certain is to make a disk image of the hard drive as your reference copy. Then push the disk image onto a new clean hard disk so you can then easily access the files. Note that this doesn't change or corrupt the reference disk image so you can do it again, and again and again if necessary.

Additionally you will also have the side benefit of being able to scan the disk for any fragments that are related to the file you're looking at. Particularly if someone pruned, changed or copied the document at some point.

What computers do is leave unused portions of harddisk space uncleaned. Instead a specific portion will be marked as either used or set for re-use. But until the computer actually writes something into that specific portion it'll have fragments of whatever was the last thing stored on it.

And this is an extremely important point since many harddisks today have become much larger.

In an age of 50MB harddrives it was pretty easy to fill it up, and so any portions marked for re-use would often be reused. But with personal harddrives in excess of 300GB and many servers storing in the terabyte range, there's a lot more opportunities for older data to remain on the harddrive.

The other thing is that I'm not certain you should fixate on Microsoft Word. Microsoft Office is the standard but there are a lot of different alternatives out there in use including Star Office. And since they're acceptance is growing it's likely you'll encounter these as well.

Generally once you've preserved a reference disk image copy of the harddisk the main thing is to not use the application in question to open any specific file. I.e. don't use Microsoft Word to open a Word document when searching for meta-data. This is because you may not know in advance what the application will do to that meta-data.

So what you'll probably end up looking for is a meta-data scanner or analyser. Something perhaps like this link.

(please note that this is not a recommendation. Just googled it as an example)

The other thing to remember also is that Microsoft Office has been shifting away from simple document creation and moving towards *collaboration* and workgrouping. This is where documents aren't just created, they are passed around and modified by several people and then the resulting changes are merged back into the original document. But in many cases the original document retains the separate modifications for reference purposes.

In this sort of instance then you might have to use the relevant program to open the file and view this data. So having that reference disk image is pretty much paramount.

*shrug* all I could think of off the top of my head. Good to have you back.

(13) J. R. Ford made the following comment | Nov 29, 2006 9:00:16 PM | Permalink

Actually, you go for their backup tapes. Get copies of the tapes for specific dates. These tapes can be restored onto a clean drive, viewed, altered, etc. and the data on the tapes will remain the same.

Backups are normally performed on a binary basis and will not affect the file when copied and restored. By having the copies of several consecutive days, an even further chain will be established in addition to the imbedded data.

Any law firm with a decent IT department will have daily backup tapes.

(14) Jim Thompson made the following comment | Nov 29, 2006 9:07:55 PM | Permalink

Short answer: hire the forensic expert as described in another comment.

There is some confusion, as you alluded, in the distinction between metadata and embedded data. Metadata is a slippery term and in general terms just means "data about data". In the context of computer file systems, it's generally understood to mean that data that's stored in the file system, as opposed to in the file itself. Data stored in the file itself is embedded data. Not everyone agrees on these definitions, but they're a basis for discussion.

Examples of metadata include the owner of the file, the date the file was created, the date it was last modified, and on some file systems, the date it was last accessed. These kinds of metadata are common to most file systems. Meta data is not always copied with the file!

Embedded data is inserted by the application into the file's stream of bytes. It's usually in the form of properties. A good example is an MP3 file that has embedded tags (ID3) describing the song's name, artist, date of recording, album, and maybe even album art. PDF and Word files have their own embedded data. Embedded data is always copied with the file.

Cryptographic checksums (hashes) like MD5 or SHA sums will show a change of even a single bit of the file's contents. A digital signature, as one comment described, will do the same because one element of a digital signature is a cryptographic hash.

But let's look beyond that one copy of the file.

If Doe is a fairly large company, it's possible that multiple copies of that document existed over time as its content was developed. Copies may exist as attachments on email. Copies may exist on more than one workstation, as well as on the file server where the master copy resides. Most importantly, if Doe has a reasonably competent IT department, backups will have been made over time. Doe may have backup tapes you can look at. They may even have archived backups stored at a 3rd-party data storage company (most companies consider regular offsite backups an important strategy for disaster recovery). Backup systems preserve a file's metadata as well as its embedded data. Go after Doe's backups, mail servers (and their backups), and the workstation of anyone who might have viewed or edited that document.

Hire yourself a good forensic data recovery expert -- he should know all the places to dig to root out as much of the history of that document as can be discovered, and he'll know how to preserve the metadata in a way that can be used as evidence in court.

Good luck.

(15) ed made the following comment | Nov 29, 2006 9:08:39 PM | Permalink

Hmmm.

Also Windows, the most likely OS, has a serious internal security problem. Basically Windows makes the assumption that any program is running is an authorized program. And so any program running then has full access to all of the internal workings of the Windows OS. This is how viruses get to be a real problem.

Here's an example:
Let's say I write a small malicious program that is intended to run under Windows but in a hidden mode where it won't show up in the Task Manager. I code this program so that it "hooks" into the Windows keyboard and disk services. So any time the user types on the keyboard it'll notify my program and tell it what's being typed. In addition anytime a file is opened, closed, created or deleted a notification will be sent to my program for those operations as well.

Say my program is coded so that if any Microsoft Office document is to be copied to another computer, device or media, my program will kick in *first* and wipe the meta-data.

This isn't exactly a difficult scenario since this emulates how viruses are designed to work. But the problem for you, if you do anything on the source's computer during discovery, is that my program can be triggered to do almost anything and it would almost impossible for you to find out.

*shrug* I could contaminate the meta-data and point the finger at someone else. Or delete the contents of the file. Or corrupt the file so you'll think that it's a dead-end. etc etc etc.

Just an fyi.

(16) Norman Yarvin made the following comment | Nov 29, 2006 9:21:19 PM | Permalink

There's no point in using checksums or digital signatures. If they're inclined to modify the file, they'll do so before checksumming it or signing it (or before you get to it to checksum it yourself, if that applies).

Just about any program that doesn't understand the file format of the file can be trusted to copy it correctly. Only a word processor, for instance, will know how to muck with the metadata inside a word processing file, or will bother trying.

Preserving the operating system timestamp(s) is more difficult, and requires, as Phelps indicated, an image of the hard drive. (There are a few programs, such as Unix "tar", which preserve those timestamps without having to take an image of the entire drive; but Windows is the norm, and I don't know what program one would use there.) But I don't know if you'd have the latitude to request the complete contents of their hard drive, as it might contain all sorts of other confidential information not pertinent to your lawsuit. Also, the operating system's metadata is much less extensive than the metadata in the file: it consists mainly of one or two timestamps per file, not the voluminous heap of data (including the make and model of the user's printer) which I've seen Word shovel into its files.

(17) DRJ made the following comment | Nov 29, 2006 10:04:19 PM | Permalink

This is fascinating but I'm afraid I'm going to have metadata nightmares.

(18) BC made the following comment | Nov 29, 2006 10:17:29 PM | Permalink

Norman, the point of checksumming is to capture the hash of the file before you put it into circulation, so that you can verify the integrity of downstream files against that "master" hash. If you don't have control over the original copy of the file, then yes, checksumming is worthless -- but if you don't have control over the original copy of the file, you're hosed anyway, because there's absolutely no way for you to establish what the original's bits actually were.

(19) Robert made the following comment | Nov 29, 2006 10:25:38 PM | Permalink

Welcome back. I forlornly opened my Belder link for months after the Miers mess.

(20) ech made the following comment | Nov 29, 2006 10:32:59 PM | Permalink

By the way, there are programs out there that will strip out the metadata embedded in Microsoft Office documents.

At my office, most of the Word metadata will be useless, as the way our systems are built means that everyone has some of the same metadata embedded (i.e. the owner of the software, etc.).

I have had a few documents passed from us by a customer that still had all the editing history and collaboration comments embedded in it. I routinely turn on the change tracking feature for documents I get to see the change history. You sometimes find the funniest comments there.

(21) Robert made the following comment | Nov 29, 2006 10:39:48 PM | Permalink

On to the subject of your post.
Phelps is right. IMHO with enough effort, even the best forensics can be spoofed. But that is a lot of effort.

(22) Norman Yarvin made the following comment | Nov 30, 2006 12:34:18 AM | Permalink

Since there is no "circulation" here, there is no point in checksumming here. We're just talking about getting a copy of the data to look at, not publishing it on the web or something. And yes, from a strict standpoint, we start out without control over the data, so we're hosed... but in practice companies do not routinely falsify documents they furnish in response to lawsuits. The incentives just aren't there: it's the corporation's money at stake, not their own cash; and they still face (at least in theory) penalties for perjury if they get caught in a lie. Also, it is quite time-consuming to put together a thorough deception. It's still worth looking for petty fraud, and nailing them on it, but one doesn't have to approach this like one does national security information.

(23) laocoon made the following comment | Nov 30, 2006 8:30:51 AM | Permalink

folks,

Using checksums does let one test bit-by-bit integrity, BUT ...

(A) Does not establish responsibility. Using a digital signature does do that, because a particular individual does have to sign.

(B) Does not stop programs from tweaking the metadata. Encryption does that, because one it is encrypted, it is just a bunch of meaningless text. Copying it around from place to place does not change it (and any metadata on the cipher text is irrelevant), so that the decrypted text will have exactly the same metadata as was originally encrypted.

So the 'sign and encipher' approach appears to deal with the problems raised above with checksums, doesn't it?

(24) Carl Pham made the following comment | Nov 30, 2006 1:55:53 PM | Permalink

The answer to your question is an unqualified NO.

Unless you have the physical hard drive in your possession, or you can hire someone whom you can prove to the Court's satisfaction that he made an exact bit-for-bit copy of it for you (cf. Phelps' comment above), there's no way you prove anything with the metadata in the file. There are about a bazillion ways in which it could be modified after the file was first created (or last modified), both on purpose and by accident, both at the user app level and the OS level.

Furthermore, I suggest a checksum, digital signature, impressive scrolled wax seal et cetera, testifying as to the authenticity of any copy, is as usual only as good as your trust in the checksummer, signer or sealer. In which case, why not have that person testify directly as to his knowledge of the file?

(25) laocoon made the following comment | Nov 30, 2006 2:42:44 PM | Permalink

Carl,

Got it: we're addressing different problems. I was thinking of how to make sure you could exactly preserve the meta/embedded data, without letting SW corrupt it. Sign and encrypt solves my problem - but certainly does not address the problem you address.

But it seems to me that having the physical hard-drive in your possession does not prove anything, either. Since you have it, you could have changed it. There's no evidence that it is even the correct hard-drive - except as far as you trust the expert to verify the drive's ID correctly, to not change it, and so on. Again, it comes down to trust in that person.

The core problem is that bits are easily changed, quite unlike DNA or powder residue. And bits related to authenticity are just as easily changed.

(26) Crank made the following comment | Nov 30, 2006 5:29:41 PM | Permalink

I have nothing to add on the bleg, but that is indeed a great story. I've had a couple of crosses go like that (can't discuss details here except to say that they resulted from things we discovered through our own digging that the other side should have produced to us and didn't) and there is hardly a better feeling in the world.

Best one I ever saw was a criminal case a few years back - I forget all the details now but basically the 'whistleblower' witness said our client, a fellow employee at a bank, had improperly approved certain transfers. The lead lawyer for us basically asked something like this (very truncated version - this went on for like 15 minutes):

Q: But you would never approve something like that yourself, would you?

A: No, of course not.

Q&A: [Shows witness a document signed by the witness in which he did precisely that, but in a fairly small sum of money, and witness admits that he did do that]

Q: But that was only a small amount of money. You wouldn't do that for a lot of money, like X?

A: No, never.

Q&A: [Shows witness a document signed by the witness in which he did precisely that, in a larger sum of money, and witness admits that he did do that]

This goes on up the ladder about three times, with the witness' denials getting more tentative as he realizes that the branch he is on has been completely sawed off. It was a thing of beauty.

(27) LeeGill made the following comment | Nov 30, 2006 8:19:55 PM | Permalink

How expensive is "hideously" and what kind of case will justify same? Is it worth it just to find more cross-examination material? Seems to me the witness was thoroughly nailed at the TRO hearing. How much latitude do you think the judge would have given you, especially when where you are going is into the opposing lawyer's computer? Not saying it wouldn't be fun to nail the lawyer too if he assisted or promoted the fraud, just that it isn't a national security case. Now, if we could convince the US atty to go after the Times' and their reporters' hard drives, tape backups, etc., that would certainly justify the expense.

(28) Beldar made the following comment | Nov 30, 2006 11:50:44 PM | Permalink

LeeGill, those are good questions and I'm not sure I have answers worthy of them.

I think that my co-counsel, "Mr. White," had decided to probe this subject "on the cheap." I'm sure he knew of the possibility of engaging experts. We've all read of the FBI and, less often, litigants in civil cases getting court approval for extraordinary measures like seizing a party's computers to image/clone it and prevent despoliation.

I haven't talked about those issues with him, but my guess is that his document production request was intended to get such useful information as we could "on the cheap." In other words, I think he may have been satisfied with whatever embedded data could be gotten from that normally created by Microsoft Word in the "native" word-processing file, without (for example) also trying to probe possible back-ups from the law firm's server and so forth. Part of that determination he may have made based on cost concerns; some of it may have been influenced by our view of the likely ethics of Doe Inc.'s law firm. (We both hold that firm in high regard, and while there's always the possibility of a rogue or inexperienced individual lawyer, we thought it hugely improbable that the firm as a whole would misbehave.)

And candidly, I was somewhat less enthusiastic about the prospect of pursuing this than Mr. White was. The harder we chased this issue, the more vehement Doe Inc.'s lawyers would become in defending themselves. And as you point out, where we'd ended the temporary injunction hearing was a pretty good spot. The inferences I could draw from that record might well have turned out to have been better than what a deeper investigation might have revealed.

Crank, that's also a good story, and it illustrates something I've preached for a long time: Your biggest single advantage as the cross-examining lawyer, as compared to even a very smart witness, is your ability to always put the next question — and with that ability, the opportunity to structure your examination in ways to tell the story as you want it told.

ech, one of the major emphasis of Prof. Hricik's CLE course was how lawyers' ethical duties to preserve their clients' confidences requires them to be alert to what embedded data is going out of their offices. The classic example (and repeatedly-true story) is of lawyers who send out Microsoft Word documents where the revision trail reveals what the client's "real bottom dollar" on settlement is, or what they thought about asking for during prior drafts.

To all who've provided technical detail: I'm grateful, and when I next face this issue, I'll study these points in considerable detail. It's clear enough that how much technical sophistication I'll need then will depend on the circumstances and just how devious I think the other side may be.

(29) Dan S made the following comment | Dec 8, 2006 1:19:45 PM | Permalink

Welcome back, Beldar.

And a great story.

A trick we use to remove embedded data from large group projects (did this often when working in a government agency as a consultant) is to create a fresh document and cut & paste the final text from the repeatedly reworked one. Do have to be careful with formatting if you don't use a standard and uniform template, though.

The sheer savings in file size are rather amazing too. It's pretty routine in those environments to drop to 5 or 10% of the prior size.

And, yeah, there are programs that do this for you.

If you ever want to see all the garbage left in a Word file, just open it with a plain text editor (or a hex editor). A lot is formatting data, and not easily human readable, but a lot is readable also.

It's pretty interesting to see a "competitive" document with all its edits, comments, and history present, as others have noted.

The comments to this entry are closed.